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ABSTRACT 

'  .  f 

This  poperi-considers  a  search  problem  in  which  fhe  search  is  directed  against  a  conscious  evader 
or  an  object  controlled  by  a  conscious  evader.  It  is  a  two-person,  zero-sum  game  called  a  search 
evasion  game.  Although  the  searcher  cannot  observe  any  of  the  evader's  actions,  the  evader  can 
observe  the  searcher's  and  can  capitalize  on  errors  that  he  makes. 

At  the  beginning  of  the  game,  the  evader  hides  in  one  of  several  boxes.  The  search  process  con¬ 
sists  of  a  sequence  of  looks  into  the  various  boxes  until  the  evader  is  found.  Each  look  into  a 
given  box  takes  a  fixed  amount  of  time.  If  the  searcher  looks  into  the  box  in  which  the  evader  is 
located,  he  will  find  the  evader  with  a  certain  probability  —  the  detection  probability  associated 
with  the  box  in  question.  A  particular  evasion  device  is  assumed:  the  evader  can  move  from  one 
box  to  another  between  looks.  A  cost  is  usually  associated  with  such  a  move. 

Primary  emphasis  is  placed  on  the  study  of  the  search  evasion  game  that  invol«s  two  boxes,  for 
solutions  have  been  found.  Two  limiting  forms  of  the  two-box  game  are  considered  first,  in  G°°, 
moving  is  prohibited.  In  G°,  the  other  limiting  form,  the  evader  can  move  at  no  cost. 


The  game  becomes  more  interesting  when  a  nonzero  but  finite  cost  is  associated  with  each  move. 
In  most  cases,  a  finite  prohibitive  bound  on  the  moving  cost  exists.  When  the  moving  cost  ex¬ 
ceeds  this  bound,  the  searcher's  good  strategy  is  identical  with  his  good  strategy  in  G*.  The 
evader  should  never  move  if  the  searcher  uses  this  strategy.  When  the  moving  cost  is  strictly  less 
than  the  prohibitive  bound,  the  searcher's  good  strategy  is  Markovian  in  form.  That  is,  the  good 
search  strategy  can  be  generated  by  a  finite  Markov  process  in  which  a  look  is  associated  with 
each  transition. 


The  search  evasion  game  that  involves  more  than  two  boxes  Is  also  studied.  In  G®,  the  limiting 
form  in  which  the  moving  costs  are  equal  to  zero,  exact  solutions  can  still  be  found.  The  basic 
properties  of  the  other  limiting  game,  where  moving  is  prohibited,  are  simple  extensions  of  those 
that  apply  when  there  are  only  two  boxes.  In  this  game,  however,  the  computational  effort  re¬ 
quired  to  find  a  solution  can  be  excessive. 

The  properties  of  the  general  many-box  game  in  which  the  moving  costs  are  neither  prohibitive 
nor  equal  to  zero  are  quite  different  from  those  that  apply  in  the  two-box  case.  Except  when  the 
moving  costs  are  very  smal  I ,  the  searcher's  good  strategy  can  no  longer  be  generated  by  a  Markov 
process.  The  complex  character  of  the  gome  is  indicated  by  the  partial  solution  that  has  been 
found  to  the  simplest  three-box  game.  The  prospects  of  being  able  to  find  exact  solutions  to  the 
general  game  in  an  efficient  manner  appear  to  be  remote.  A  particular  approach  to  finding  ap¬ 
proximately  good  search  strategies  is  suggested  for  future  research. 


*  This  report  is  based  on  a  thesis  of  the  some  title  submitted  to  the  Department  of  Electrical  Engi¬ 
neering  at  the  Massachusetts  Institute  of  Technology  on  31  August  1962,  in  partial  fulfillment  of 
the  requirements  for  the  degree  of  Doctor  of  Science. 
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STUDIES  IN  SEARCH  FOR  A  CONSCIOUS  EVADER 


CHAPTER  1 
INTRODUCTION 


1.1  HISTORY  OF  THE  PROBLEM 

Operations  research  was  first  recognized  as  a  formal  discipline  during  World  War  II  when 
scientific  methods  of  analysis  were  applied  to  operational  military  problems.  One  of  the  first 
endeavors  of  this  new  discipline  in  this  country  was  the  development  of  satisfactory  methods  for 
searching  for  enemy  submarines.  The  search  theory  that  evolved  considered  a  homogeneous 
environment  in  which  the  hidden  object  (submarine)  was  located.  From  this  theory  developed 
search  patterns  that  optimized  the  probability  of  detection  when  it  was  assumed  that  the  hidden 
object  was  either  stationary  or  moving  in  some  prescribed  manner  independent  of  the  search 
effort.  Since  then,  this  work  has  been  developed  further,  notably  by  Koopman. 

Search  problems  in  which  the  environment  cannot  be  approximated  by  a  homogeneous  one 
have  also  been  considered  by  assuming  a  discrete  environment.  In  these  problems,  the  search 
effort  consists  of  a  sequence  of  looks  into  various  boxes  in  which  the  object  may  be  hidden.  In 
many  cases,  the  time  required  to  examine  a  given  box  is  fixed  or  "quantized." 
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Gluss  considers  a  problem  of  this  type  in  the  process  of  developing  sequences  for  testing 
the  various  subassemblies  of  a  complex  system  in  order  to  find  a  faulty  component.  In  his  work, 
he  assumes  that  on  each  look  the  searcher  either  locates  the  object  (faulty  part)  or  gains  no  in¬ 
formation.  In  other  words,  the  failure  to  find  the  object  in  a  given  box  does  not  decrease  the 
probability  that  the  object  is  there. 

Pollock^  treats  a  similar  problem,  particularly  the  case  in  which  there  are  only  two  boxes. 
He  assumes  that  when  the  correct  box  is  examined,  the  object  is  found  with  probability  q^,  where 
is  called  the  detection  probability  of  the  box  in  question.  After  an  unsuccessful  look,  the  prob¬ 
ability  that  the  object  is  in  the  box  just  examined  is  decreased  according  to  Bayes'  rule.  Many 
of  Pollock's  results  are  found  in  Chapter  2.  In  particular,  he  originated  the  approach  used  in 
Sec.  2.3. 

Another  discrete  search  problem  is  considered  by  Blackman.^' ^  He  studies  a  problem  in 
which  one  or  more  objects  appear  as  the  search  process  goes  along. 

A  feature  common  to  all  these  problems  is  that  the  object  never  moves  from  one  box  to 
another. 

Examples  can  be  found  in  both  continuous  and  discrete  search  problems,  however,  where 
the  object  need  not  be  stationary  or  move  in  some  arbitrary  manner.  Rather,  the  object  may  be 
an  intelligent  evader  who  attempts  to  outwit  the  searcher  by  moving  so  as  to  increase  his  chances 

of  escape.  The  search  problem  in  such  a  situation  should  be  treated  as  a  game  where  the  actions 
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of  the  evader  are  taken  into  account.  Dubbins  points  out  this  consideration  in  a  discussion  of 
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the  treatment  of  military  problems  associated  with  tactics,  pursuit,  evasion,  search,  and  the 
like  when  he  says,  "For  the  most  part  these  have  been  treated  as  'one-sided'  problems;  given 
prescribed  behavior  for  an  opponent,  one  seeks  to  optimize  the  result  of  his  own  actions.  With 
the  development  of  the  theory  of  zero-sum,  two-person  games,  however,  it  has  been  natural  to 
seek  extensions  to  the  more  realistic  'two-sided'  problems  in  which  each  of  the  two  participants 
is  free  to  choose  his  actions  from  a  non-trivial  class  of  possible  strategies."  This  is  a  valid 
statement,  and  several  of  the  problems  mentioned  above  have  been  considered  from  a  two-sided 
point  of  view.  The  author,  however,  has  been  unable  to  find  any  papers  in  which  the  theory  of 
two-sided  search  is  treated. 

This  paper  will  consider  a  discrete  two-sided  search  problem  in  which  the  looks  are  quan¬ 
tized.  To  avoid  the  ambiguity  of  the  term  "two-sided  search,"  it  will  be  called  a  "search  evasion 
game."  Before  going  into  the  details  of  the  game  to  be  studied,  let  us  consider  a  particular 
example. 

1.2  THE  REVENUER  VS  THE  MOONSHINER:  AN  EXAMPLE 

In  a  particular  section  of  the  hills  of  Tennessee,  it  is  known  that  a  moonshiner  is  operating 
an  illegal  still.  As  a  result,  a  federal  agent  has  been  dispatched  to  the  scene  and  a  game  ensues. 
In  this  game,  we  assume  that  the  moonshiner  can  operate  his  still  in  any  of  several  locations  or 
areas  known  to  both  players.  Each  day,  the  revenuer  selects  one  of  these  areas  to  search,  and 
he  continues  his  hunt  until  he  catches  his  man.  Since  the  moonshiner  is  a  clever  fellow,  he  can 
conceal  his  apparatus  in  such  a  manner  that  he  will  not  necessarily  be  found  when  the  revenuer 
searches  the  area  in  which  he  is  located.  Rather,  he  will  be  found  with  a  certain  probability, 
the  detection  probability  of  that  area.  It  will  be  assumed  that  both  players  know  the  detection 
probability  of  each  of  the  various  areas. 

The  moonshiner,  being  a  rational  businessman  at  heart,  is  mainly  interested  in  securing  a 
good  profit  for  himself.  He  knows  that  his  still  will  yield  him  a  profit  of  one  unit  per  day. 

Through  spies,  or  by  other  means,  he  can  observe  where  the  revenuer  looks,  and  he  realizes 
that  he  can  prolong  the  expected  length  of  his  operation  by  changing  the  location  of  his  still  from 
time  to  time.  Since  the  revenuer  searches  during  the  day,  the  evader  knows  that  he  can  move 
his  still  with  relative  safety  at  night.  However,  when  he  moves  it,  he  must  suspend  his  opera¬ 
tion  and  destroy  the  material  being  processed  at  the  time.  This  will  cost  him  p  units.  We  shall 
assume  that  once  he  has  completed  his  move,  he  can  immediately  replace  the  process  materials 
so  that  his  future  production  is  not  affected.  Nonetheless,  our  entrepreneur  realizes  that  he 
suffers  a  loss  in  profit  whenever  he  moves  and  that  he  must  balance  this  moving  cost  against  the 
advantage  of  a  possibly  longer  career. 

The  revenuer  is  also  an  intelligent  man  who  has  specific  motives.  Since  none  of  the  areas 
has  a  detection  probability  equal  to  zero,  he  knows  that  he  can  eventually  catch  his  man.  However, 
in  addition  to  catching  criminals,  he  is  interested  in  making  this  crime  as  unattractive  as  possi¬ 
ble.  That  is,  he  considers  deterrence  to  be  one  of  his  primary  functions.  As  a  result,  he  is  in¬ 
terested  in  minimizing  the  expected  profits  which  the  moonshiner  accumulates  before  he  is  caught. 
Therefore,  the  two  players'  interests  are  directly  opposed. 

This  is  a  two-person,  zero-sum  game.  The  revenuer  is  a  searcher  and  the  moonshiner  is 
an  evader.  Each  of  the  areas  in  which  the  evader  can  hide  can  be  called  a  box,  and  each  box  has 
an  associated  detection  probability.  The  search  process  consists  of  a  sequence  of  looks  into  the 


various  boxes.  The  evader  can  observe  where  the  searcher  looks,  and  between  each  pair  of  looks 
the  evader  can  move  from  one  box  to  another.  The  game  continues  until  the  evader  is  found. 

1.3  THE  SEARCH  EVASION  GAME  MODEL 

The  game  described  above  is  a  particular  example  of  the  search  evasion  game  to  be  studied 
in  this  paper.  This  game  was  motivated  by  a  problem  involving  inspection  under  an  arms  con¬ 
trol  agreement.  Assume  that  the  manufacture  of  certain  weapons  systems  is  prohibited  by  an 
arms  control  treaty  and  that  under  this  treaty  an  inspectorate  is  established  to  enforce  the  agree¬ 
ment.  One  of  the  functions  of  this  inspectorate  would  be  to  visit  the  various  places  where  such 
systems  could  be  manufactured.  The  purpose  would  be  twofold:  to  discourage  a  violation  of  the 
treaty  and  to  disclose  any  such  violation  if  it  occurs.  Although  the  inspectee  may  choose  to 
honor  the  treaty,  the  inspector  has  no  reason  to  assume  this.  Furthermore,  whether  the  in¬ 
spector  wishes  to  deter  a  violation  or  minimize  the  possible  advantages  of  such  a  violation,  it  is 
reasonable  to  assume  that  his  opponent's  gain  is  his  loss.  This  results  in  a  two-sided,  zero-sum 
game  in  which  the  inspector  should  assume  that  a  clandestine  operation  exists.  When  it  does,  the 
game  becomes  interesting. 

Although  the  game  to  be  studied  was  motivated  by  the  arms  control  problem,  no  claims  are 
made  as  to  the  validity  of  the  model  to  be  defined,  in  this  context.  Many  simplifying  assumptions 
have  been  made.  Furthermore,  all  political  considerations  have  been  ignored  and  an  arbitrary 
utility  function  is  assumed.  The  result  is  a  game  that  is  studied  for  its  own  sake.  It  is  a  two- 
sided  extension  of  a  more  classic  one-sided  search  problem.  It  will  be  interesting  to  see  how  a 
particular  evasion  device  —  moving  between  looks  —  affects  the  behavior  of  the  game.  If  the  re¬ 
sults  of  this  study  can  be  applied  to  a  practical  problem,  perhaps  the  one  just  mentioned,  so 
much  the  better. 

In  our  search  evasion  game  there  are  two  players,  the  searcher  and  the  evader.  The  evader 
must  hide  at  the  beginning  of  the  game  in  one  of  a  set  of  boxes.  The  searcher  must  make  a  se¬ 
quence  of  looks  into  these  boxes  until  he  finds  the  evader.  A  look  into  a  particular  box  takes  a 
particular  amount  of  time,  known  to  both  players.  If  the  searcher  looks  into  box  i  and  the  evader 
is  there,  the  evader  will  be  found  with  probability  q.,  where  q.  is  the  detection  probability  of 
box  i.  The  detection  probability  of  each  box  is  known  to  both  players.  We  shall  always  assume 
that  the  evader  can  observe  where  the  searcher  looks.  Unless  a  statement  is  made  to  the  con¬ 
trary,  we  shall  also  assume  that  the  evader  can  move  from  one  box  to  another  between  looks.  If 
a  cost  is  associated  with  such  a  move,  this  cost  is  known  to  both  players. 

To  complete  the  definition  of  this  game,  a  utility  function  over  the  possible  outcomes  of  the 
game  is  needed  for  each  player.  It  is  not  appropriate  to  develop  the  theory  of  utility  here.  A 
good  treatment  of  this  theory  as  well  as  the  theory  of  games  in  general  can  be  found  in  Ref.  9. 

We  shall  assume  that  the  utility  of  any  outcome  can  be  expressed  in  a  numerical  form  equivalent 
to  money.  The  game  is  zero-sum:  the  sum  of  the  two  players'  utilities  for  a  given  outcome  must 
equal  zero.  Thus,  one  player's  utility  is  the  negative  of  the  other's,  and  only  one  of  the  utilities 
need  be  considered  explicitly.  In  this  game,  the  evader's  utility  will  always  be  used. 

The  search  evasion  game  is  of  a  sequential  nature  and  may  be  thought  of  as  a  two-sided  ex¬ 
tension  of  a  sequential  decision  process.  A  particular  play  or  outcome  of  the  game  consists  of 
a  particular  sequence  of  events.  An  unsuccessful  look  into  box  i  while  the  evader  hides  in 
box  j  is  such  an  event.  Similarly,  an  event  occurs  when  the  evader  moves  from  one  box  to 
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another  or  when  he  chooses  not  to  move  between  a  given  pair  of  looks.  Associated  with  each 
event  is  a  "reward."  This  reward  is  equal  to  the  amount  that  the  event  in  question  contributes  to 
the  evader's  utility.  Thus,  the  utility  of  a  given  play  of  the  game  is  equal  to  the  sum  of  the  re¬ 
wards  associated  with  the  events  that  occur.  In  the  example  of  the  revenuer  vs  the  moonshiner, 
a  reward  of  one  unit  was  associated  with  each  look.  This  reward  did  not  depend  upon  the  box  in 
which  the  evader  was  hiding  and  was  even  collected  on  the  final  look  when  the  evader  was  found. 

A  reward  was  also  associated  with  each  move.  This  reward  was  equal  to  — p.  If  a  reward  is 
negative,  we  may  refer  to  the  corresponding  positive  quantity  as  a  "cost"  or  a  "loss."  Thus,  in 
the  above  example,  a  cost  of  p.  was  associated  with  each  move. 

Both  players  may  use  strategies  involving  random  decisions.  Also,  a  stochastic  element 
io  i!.tp<jduciid  by  the  detection  pr jbatililiee  -of  the  various  fcixee.  Ae  a  rteuU,  a  particular  play, 
or  sequence  of  events,  cannot  be  associated  with  a  given  pair  of  strategies  for  the  two  players. 
Rather,  such  a  pair  of  strategies  defines  a  probability  distribution  over  the  various  plays  that 
can  occur.  By  taking  the  expected  value  of  the  utilities  associated  with  these  plays  over  the  above 
probability  distribution,  a  utility  can  be  associated  with  a  given  pair  of  strategies.  The  evader's 
utility  w'ill  be  called  the  "payoff"  for  the  given  pair  of  strategies.  This  payoff  will  be  equal  to  the 
expected  value  of  the  sum  of  the  rewards  associated  with  the  various  events  that  can  occur.  Since 
U  *■  ’  •  f  d  'b»l  ‘'i*  ».  r 

we  <;ee  that  the  evadev  ta  frttereated  in  mavimi7in|^  the  j.ayoff  while  the  aearehen  ia  interested  in 
minimizing  it. 

We  shall  assume  that  payments  are  made  while  the  game  is  being  played.  When  an  event 
occurs,  the  searcher  pays  the  appropriate  reward  to  the  evader.  This  will  be  necessary  when 
discounting  is  considered  in  Chapter  7.  The  actual  transfer  of  a  reward,  however,  cannot  be 
■BSt-tl  to  pri'riJ  -  -iiifori'.iSbioii  Rn.  Mifrt  .  CCul  l  nms,  bln.  tttnnifft  inlet 

that  a  move  occurred  because  he  received  p  units.  We  think  of  the  rewards  as  being  transferred 
from  the  searcher  to  the  evader  only  because  the  searcher  considers  the  evader's  gain  to  be  his 
loss. 

In  Chapters  2  through  7,  the  search  evasion  game  that  involves  only  two  boxes  will  be  studied 
in  some  detail,  for  the  "good"  strategies,  or  at  least  "c-good"  strategies,  can  be  found  for  the 
two  players. 

In  Chapters  2  through  5,  a  simple  reward  structure  is  assumed.  The  evader  receives  one 
unit  for  each  look  and  pays  p  units  for  each  move.  In  Chapter  2,  the  game  in  which  moving  is 
prohibited  (where  the  evader  chooses  only  where  he  hides)  is  considered.  In  Chapter  3,  the  other 
limiting  form  of  the  game  is  considered  —  the  game  in  which  p,  the  moving  cost,  is  equal  to  zero. 
In  Chapters  4  and  5,  the  more  general  game  where  p  is  finite  but  unequal  to  zero  is  treated.  In 
Chapter  4,  the  evader's  good  strategy  is  developed;  in  Chapter  5,  the  searcher's. 

A  more  general  reward  structure  is  assumed  for  the  two-box  game  in  Chapter  6.  The  re¬ 
ward  associated  with  a  given  look  depends  on  where  the  look  is  made  and  also  on  where  the  evader 
is  hiding.  The  moving  cost  depends  on  which  move  occurs.  Finally,  a  detection  loss  is  subtracted 
from  the  evader's  payoff  when  he  is  found.  This  loss  can  depend  on  where  the  evader  is  hiding 
when  detection  occurs. 

In  Chapter  7,  discounting  is  introduced.  With  discounting,  the  utility  of  a  given  event  decays 
bill;  -aniothA  ^4  bblnc  blTSb  1  el oTt;  bbil;  ev  aA  oLLbtrs  b..is  .TjtuA-hjg 

is  useful  in  situations  where  immediate  rewards  are  more  important  than  rewards  delayed  until 
the  distant  future. 
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In  Chapter  8,  the  N-box  form  of  the  search  evasion  game  is  considered.  When  moving  is 
prohibited,  the  general  properties  of  this  game  are  simple  extensions  of  those  of  the  two-box 
form.  The  computational  effort  required  to  find  the  good  strategies,  however,  becomes  far  more 
difficult.  When  the  moving  costs  are  all  equal  to  zero,  the  game  is  fairly  simple,  and  exact  so¬ 
lutions  can  be  obtained.  The  good  search  strategy  of  this  game  will  prove  to  be  of  special  interest. 
The  general  N-box  game  in  which  the  moving  costs  are  unequal  to  zero  but  finite  becomes  very 
complex  and  the  general  approach  used  in  the  two-box  game  breaks  down.  The  partial  solution 
of  a  very  simple  example  indicates  the  complex  character  of  the  general  N-box  game.  It  is  be¬ 
lieved  that  only  approximate  solutions  are  feasible,  and  a  particular  approach  to  finding  an  ap¬ 
proximately  good  search  strategy  is  suggested  for  future  research. 
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CHAPTER  2 

G“:  THE  SEARCH  EVASION  GAME  WITH  MOVING  PROHIBITED 


2.1  INTRODUCTION 

In  this  chapter  we  shall  study  the  search  evasion  game  in  which  the  evader  is  not  allowed  to 
move  between  looks.  We  should  expect  that  in  G,  the  game  without  this  restriction,  the  evader 
would  not  move  if  the  moving  cost  p  became  sufficiently  large.  The  game  where  moving  is  pro¬ 
hibited  will,  therefore,  be  called  G*.  As  a  limiting  form,  its  properties  should  be  of  interest, 
and  we  shall  see  that  the  mathematical  techniques  developed  in  its  study  will  be  useful  in  the 
more  general  game. 

The  rules  for  g”  can  be  stated  simply.  The  evader  may  hide  in  either  of  two  boxes  or  may 
make  any  random  choice  between  them.  The  searcher  must  pay  the  evader  one  unit  for  each  look 
and  must  continue  his  search  until  he  finds  the  evader.  The  payoff  of  the  game  for  a  given  pair 
of  strategies  is  equal  to  the  expected  search  time.  Associated  with  each  box  i  is  a  detection 
probability  known  to  both  players.  It  is  the  probability  that  a  look  into  that  box  will  reveal  the 
evader's  presence  if  he  is  there. 

A  strategy  for  the  evader  can  be  defined  by  the  probability  vector  P  =  {p^,  p^}  that  deter¬ 
mines  where  he  hides.  The  evader  is  not  required  to  reveal  the  vector  he  uses  to  the  searcher. 

A  fundamental  theorem  from  the  theory  of  games  states,  however,  that  if  both  players  use  good 
strategies  neither  suffers  any  disadvantage  by  revealing  his  strategy  to  the  other.  As  a  result, 
we  shall  study  the  modified  game  F°°  in  which  the  searcher  is  informed  of  the  vector  that  the 
evader  selects.  This  study  will  compose  the  bulk  of  this  chapter,  for  once  the  solution  of  F°°  is 
known  the  solution  of  G°°  is  simple. 

2.2  MODIFIED  GAME  F“:  EVADER’S  STRATEGY  KNOWN  TO  SEARCHER 

Having  required  the  evader  to  reveal  his  strategy,  f”  almost  ceases  to  be  a  game.  Once 
the  evader  has  hidden,  he  no  longer  has  any  control  over  his  fate,  and  the  searcher  is  simply 
faced  with  a  problem  of  optimization.  With  this  as  a  rationale,  we  will  adopt  the  convention  of 
using  the  term  "optimum  strategy"  in  place  of  "good  strategy"  in  reduced  games  of  this  type 
where  one  player  is  required  to  reveal  his  strategy  to  the  other.  The  term  "good  strategy"  will 
be  reserved  for  use  in  the  original  game  in  which  this  requirement  does  not  apply. 

An  optimum  strategy  for  the  searcher  in  f”  may  be  thought  of  as  a  rule  for  generating,  as 
a  function  of  P,  a  search  sequence  that  minimizes  the  expected  search  time.  Since  we  are  con¬ 
sidering  only  two  boxes,  P  has  only  one  degree  of  freedom,  and  it  is  convenient  to  adopt  the  no¬ 
tation  P  =  {P,  1  —  P}.  The  symbol  P  equals  the  probability  that  the  evader  will  hide  in  box  1, 
and  we  can  let  U°°(P)  represent  the  expected  search  time  that  results  when  the  searcher  uses  an 
optimum  strategy.  As  we  shall  soon  see,  the  optimum  search  strategy  is  simple  in  that  U°°(P) 
need  not  be  known  numerically  in  order  to  determine  this  strategy. 

On  the  other  hand,  the  evader's  optimum  strategy  in  f”  (his  good  strategy  in  G°°)  consists 
in  selecting  that  P  at  which  U°°(P)  is  a  maximum.  For  this  reason,  and  because  u'*’(P)  is  needed 
in  order  to  determine  the  good  search  strategy  in  g”,  it  must  be  calculated.  This  calculation  is 
fairly  involved  and  will  be  considered  in  some  detail.  The  techniques  developed  will  be  useful  in 
the  more  general  game  where  moving  is  allowed  at  a  cost. 


In  the  process  of  looking  for  the  evader,  the  searcher  must  make  a  sequence  of  decisions 
as  to  where  he  should  look,  until  the  evader  is  found.  Since  the  evader  may  take  no  counter¬ 
measures  once  he  has  hidden,  these  decisions  can  be  made  in  advance  and  can  be  deterministic. 
Since  the  game  ends  with  the  first  successful  look,  an  optimum  strategy  may  be  viewed  as  asso¬ 
ciating  with  each  P  a  single  infinite  look  sequence  that  is  used  as  long  as  necessary. 

Although  we  can  expect  the  optimum  search  sequences  to  be  different  for  different  values 
of  P,  it  is  worthwhile  to  see  how  the  expected  search  time,  or  payoff,  associated  with  a  fixed 
sequence  behaves  as  a  function  of  P.  Let  a  represent  the  payoff  that  results  if  the  evader  ac¬ 
tually  hides  in  box  i  (P  =  1)  and  let  b  represent  the  corresponding  payoff  if  the  evader  hides  in 
box  2  (P  =  0)  when  a  fixed  sequence  is  used.  With  this  sequence,  the  payoff  as  a  function  of  P 
is  [a  P  +  b(l  —  P)]  and  is  linear.  Consider  the  ensemble  of  linear  functions  generated  by  the  in¬ 
finite  set  of  all  infinite  search  sequences.  The  expected  search  time  u'”(P)  must  be  the  greatest 
lower  bound  on  this  ensemble.  Therefore,  U°°(P)  must  be  continuous  and  convex.  Throughout 
this  paper  a  function  f(x)  will  be  considered  convex  if  for  all  x.,  x^  e  X,  0  y  1, 
yf(x.)  +  (1  -y)  f(x^)  <  f(yx.  +  (1  -y)  x^] 

In  many  cases,  we  shall  find  that  u'”(P)  is  piecewise  linear  over  any  interval  (e,  1  —  e)  where 
c  >  0.  That  is,  if  we  exclude  the  intervals  (0,  e)  and  (i  —  e,  l)from  the  interval  (0,  1)  over  which 
P  is  defined,  the  remainder  (e,  1  —  e)  can  be  partitioned  into  a  finite  set  of  nonzero  intervals 
over  each  of  which  U°°(P)  is  linear.  The  quantity  e  must  be  strictly  greater  than  zero,  for  the 
linear  intervals  will  always  become  arbitrarily  short  as  P  approaches  zero  or  one.  If  U°°(P)  is 
linear  over  a  nonzero  interval,  a  single  infinite  search  sequence  is  optimum  over  this  interval, 
and  U°°(P)  equals  the  payoff  associated  with  this  sequence  over  this  interval.  At  a  point  where 
u”{P)  is  formed  by  the  intersection  of  two  linear  functions,  the  sequences  associated  with  both 
functions  are  optimum. 

Some  additional  properties  of  U°°(P)  are  easily  shown.  Since  at  least  one  look  is  required 
to  find  the  evader,  u”(P)  must  be  positive.  Furthermore,  it  must  be  bounded  as  long  as  the 
detection  probabilities  are  all  unequal  to  zero.  This  will  always  be  assumed,  since  the  game 
holds  little  interest  otherwise.  If  the  evader  is  known  to  be  in  a  particular  box,  it  is  clear  that 
the  searcher  should  always  look  there.  The  expected  search  time  is  then  equal  to  the  reciprocal 
of  the  detection  probability  and  we  find  that  U°°(0)  =  l/q2  and  u”!!)  =  l/q^. 

2.3  DYNAMIC  PROGRAMMING  WITH  P  AS  A  STATE  VARIABLE 

The  function  u”(P)  has  been  defined  as  the  payoff  of  F°°  that  applies  when  the  evader  hides 
in  box  1  with  probability  P  and  in  box  2  with  probability  (1  -  P)  and  the  searcher  uses  an  opti¬ 
mum  search  sequence.  Since  the  searcher  knows  the  a  priori  probability  distribution  defining 
the  evader's  position  at  the  beginning  of  the  game,  he  can  calculate  the  a  posteriori  probability 
distribution  that  applies  after  a  sequence  of  unsuccessful  looks.  Therefore,  at  any  point  in  the 
game,  we  can  use  P  =  (P,  1  —  P)  to  represent  the  probability  distribution  defining  the  evader's 
position  at  that  time.  The  searcher's  future  behavior  should  depend  only  upon  the  value  of  P 
which  applies  at  a  given  time.  That  is,  the  searcher's  future  sequence  of  looks  should  be  the 
same  as  the  entire  sequence  that  would  apply  if  the  evader  had  originally  used  this  P  in  hiding 
at  the  beginning  of  the  game.  The  probability  P  can  be  treated  as  a  state  variable  and  u”°(P) 
can  be  used  to  represent  the  future  payoff  that  results  if  the  searcher  uses  his  optimum  strategy 
for  all  future  looks. 
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The  manner  in  which  P  changes  during  the  search  process  is  easily  shown.  Let  us  adopt 
the  notation  P — '  P'  to  indicate  that  P  is  transformed  into  P'  by  an  unsuccessful  look  into 
box  i.  Then, 


Pr, 


P  — 


Pr^  +  (1  -  P) 


i-l  =  1  -  Pi 


(2-1) 


P  —  ■>  P'  = 


P  _ 

P  +  (1  -  P)  r^ 


^2  =  ^-^2 


(2-2) 


If  a  sequence  involving  more  than  one  look  transforms  P  into  P',  this  sequence  can  be  written 
over  the  arrow  in  a  similar  manner.  However,  the  final  transformation  depends  only  on  the 
total  number  of  looks  into  each  box  and  not  on  their  order,  and  it  is  often  convenient  to  repre- 

(k^,  k2) 

sent  the  transformation  that  involves  k^  looks  into  box  1  and  k^  looks  into  box  2  by  P - -  P'. 

Then, 


(k,,k  ) 

P  — - =.  P' 


Pr. 


Pr, 


+  (1  -P)  r. 


(2-3) 


In  the  above  expressions,  r^  and  r^,  the  complements  of  the  detection  probabilities  q^  and 
q^  are  used.  We  shall  call  these  complements  the  escape  probabilities.  If  the  evader  is  hiding 
in  box  i  and  the  searcher  looks  into  that  box,  the  evader  will  escape  detection  (will  not  be  found) 
with  probability  r^.  Although  only  the  detection  probabilities  are  needed,  we  shall  find  it  con¬ 
venient  to  use  both  the  r's  and  the  q's  in  our  expressions,  with  the  condition  +  r^  =  1  implied. 

We  are  now  in  a  position  to  write  the  fundamental  functional  equation.  If  the  searcher  looks 
into  box  1,  the  evader  receives  one  unit  for  the  look  and  survives  this  look  with  probability 
(Pr^  +  t  —  P).  When  this  occurs,  P  transforms  into  Pr^/(Pr^  +  1  —  P).  Similar  conditions  ap¬ 
ply  if  the  searcher  looks  into  box  2.  Letting  U*’(P;  i)  represent  the  payoff  if  box  i  is  examined 
first,  after  which  an  optimum  search  strategy  is  employed,  we  have 


U”(P)  =  min 


u”(P;l)  =  l  +  [Pr,  +  1-P]U”  [p^-!-7-p| 
U”(P;  2)  =  1  +  IP  +  (1  -  P)  r^]  U”  [p  +  (,^p) 


(2-4) 


This  functional  equation  makes  two  problems  apparent.  In  order  to  find  the  searcher's  op¬ 
timum  strategy  we  must  find  v.'hich  of  U”(P;  1)  and  U°°(P:  2)  is  smaller  for  a  given  P.  Once  this 
is  known,  we  are  still  faced  with  the  problem  of  evaluating  U^IP),  for  Eq.  (2-4)  expresses  U°°(P) 
as  a  function  of  another  unknown  u'”(P').  Unless  P  eventually  transforms  back  into  itself  after 
a  finite  number  of  optimum  looks,  U”°(P)  must  be  evaluated  by  means  of  an  infinite  series. 

At  present  we  are  in  a  position  to  derive  the  searcher's  optimum  strategy.^  It  is  clear  that 
if  P  =  i,  the  searcher  should  look  into  box  1,  and  if  P  =  0  he  should  look  into  box  2.  It  is  rea¬ 
sonable  to  assume  that  there  exists  a  such  that  if  P  is  greater  than  Pp  the  searcher  should 
look  into  box  1,  whereas  if  it  is  less  than  P^  he  should  look  into  box  2.  When  P  =  Pq  we  should 


t  A  rigorous  proof  is  contained  in  Appendix  A.  A  more  general  form  of  P®  is  treated,  and  its  reading  should  be 
deferred  until  after  one  has  read  Chapters  6,  7  and  8. 
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expect  that  the  searcher  can  look  into  either  box.  A  look  into  box  1,  however,  decreases  P  and 
requires  that  the  next  look  be  into  box  2.  Similarly,  if  the  first  look  is  into  box  2,  P  will  be¬ 
come  greater  than  Pg  and  the  next  look  should  be  into  box  1.  Letting  U°°(P;  12)  represent  the 
payoff  when  the  searcher  looks  into  box  1  and  then  into  box  2,  after  which  an  optimum  strategy 
is  employed,  we  have 

U“’(P:12)  =  1  +  [Pr^  +  1  -P1+  [Pr^  +  (1-  P)  r^l  U"  [pp— .  (2-5) 

In  a  similar  manner, 

U”(P:  21)  =  1  +  [P  +  (1  -  P)  r^]  +  [Pr^  +  (1  -  P)  r^l  u”  p)  r  J  ' 


Note  that  both  equations  contain  the  same  expression 

[Pr^  +  (1  -  P)  r^l  U  [pr^  +  (1  -  P)  r^] 
for  the  expected  future  payoff  after  the  first  two  looks.  The  equation 
u'”(Pg)  =  u“(Pq;  12)  =  u"(Pg;  21) 
cancels  these  terms  and  reveals  that  Pq<1j  =  (1  —  Pq)  q2.  or 

^0  -  q^  +  q^  • 

The  searcher's  optimum  strategy  therefore  requires  that 


(2-7) 


if  P  >  Pg  = 

0  +  q? 


look  into  box  1 


if  P  <  P 


0 


look  into  box  2 


if  P  =  P^ 


q,  +  q. 


look  into  either  box 


(2-8) 


Noting  that  P  =  {p,  1  —  p}  =  {p^^.p^},  we  see  that  the  searcher's  optimum  strategy  requires  him 
to  look  into  the  box  for  which  Pjq^  is  the  larger.  That  is,  the  searcher  should  make  the  choice 
that  maximizes  the  probability  of  finding  the  evader  on  the  next  look. 

This  strategy  is,  in  a  sense,  a  deterministic  form  of  a  behavioral  strategy,  and  it  is  worth¬ 
while  here  to  contrast  behavioral  strategies  with  pure  and  mixed  strategies.  Let  us  consider  a 
game  tree.  Each  node  of  the  tree  corresponds  to  a  move  for  one  of  the  players.  Each  branch 
extending  from  the  node  represents  one  of  the  possible  alternatives  that  the  player  can  select  on 
that  move.  The  nodes  of  the  tree  are  partitioned  into  a  set  of  information  sets.  For  all  nodes 
in  a  given  information  set,  the  same  information  concerning  the  past  play  is  available  to  the  player 
whose  move  it  is.  All  moves  in  a  given  information  set  must  have  the  same  alternatives. 


A  pure  strategy  may  be  thought  of  as  a  list  that  specifies  the  alternative  which  should  be 
selected  in  each  information  set.  A  mixed  strategy  specifies  a  probability  distribution  over  the 
set  of  pure  strategies.  Thus,  when  a  player  uses  a  mixed  strategy,  he  selects  a  pure  strategy 
at  the  beginning  of  the  game  by  means  of  this  probability  distribution.  Once  this  selection  has 
been  made,  he  selects  his  alternatives  throughout  play  deterministically.  In  our  game,  an  in¬ 
finite  search  sequence  is  a  pure  strategy  and  a  random  selection  of  an  infinite  search  sequence 
is  a  mixed  strategy. 

In  a  behavioral  strategy,  a  player  associates  with  each  information  set  a  probability  distri¬ 
bution  for  selecting  his  next  alternative.  Thus,  when  a  behavioral  strategy  is  used,  random  de¬ 
cisions  are  employed  throughout  play  and  not  just  at  the  beginning.  Behavioral  strategies  are 
completely  general  as  long  as  the  players  have  perfect  recall,  which  they  do  in  this  game. 

In  succeeding  chapters,  we  shall  find  that  we  can  characterize  our  information  sets  by  means 
of  state  variables.  The  strategies  that  we  shall  develop  will  be  formulated  in  terms  of  decision 
rules  which  are  functions  of  these  state  variables.  They  will,  therefore,  be  behavioral  strategies. 
In  F°°,  the  searcher's  optimum  decision  rule  is  a  deterministic  function  of  P  and  therefore,  in 
a  rigorous  sense,  is  a  pure  strategy.  As  our  study  develops,  however,  we  shall  find  that  behav¬ 
ioral  strategies  are  employed  increasingly. 

2.4  EXAMPLE  WHERE  BOXES  ARE  IDENTICAL:  =  Qg  =  q 

When  the  two  boxes  are  identical  the  searcher's  strategy  is  very  simple.  -  \/Z  and  the 
searcher  should  always  look  where  the  evader  is  most  likely  to  be.  We  also  have  the  good  for¬ 
tune  in  this  example  to  find  that  if  the  searcher  looks  first  into  one  box  and  then  into  the  other, 

P  returns  to  its  initial  value,  that  is,  Prj/[Pr^  +  (1  —  P)  r^J  =  P.  Clearly,  if  such  a  pair  of 
looks  is  optimum,  it  should  be  repeated,  and  the  total  optimum  sequence  should  consist  of  alter¬ 
nate  looks  into  the  two  boxes. 

In  order  to  find  when  such  a  sequence  is  optimum,  let  us  define  Pq^  and  P^^  as  the  proba¬ 
bilities  into  which  Pq  is  transformed  by  an  unsuccessful  look  into  box  i  and  box  2,  respectively. 
Then, 

p  p  -  ^o*'i  . 

0  ^01  -  Pgr^  +  1  -  P 


and 

P  _L,  p  = _ ^0 _ 

^0  ‘^OZ  -  Pg  +  (1  -  P)  r^  • 

These  probabilities  will  be  of  use  in  the  more  general  case  where  q^  ^  q^,  and  the  definition 
given  above  should  be  kept  in  mind.  In  this  example,  however,  P^^  =  r/(l  +  r)  ,  P^^  =  l/(l  +  r), 
and  we  find  that 


If  P  belongs  to  the  interval  (Pg^,  Pg)  the  searcher  should  look  into  box  2,  and  P  transforms  into 
(Pq,  Pg^)-  A  look  into  box  1  is  then  called  for,  since  P  >  Pq,  and  P  transforms  back  to  its 
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12 


_  A.  K+1  V 

U“(P)=  I  +k-(l-P)  -  +k)  ,  P€(P.^_,,P_^) 

where 

P  (0, k)  p 
-k  '  0 

or 


The  optimum  sequence  in  the  interval  (P  1’  ^  k^  consists  of  k  looks  into  box  2  followed  by 

2121.  .  .  . 

This  completes  the  solution.  The  function  u”(P)  is  graphed  in  Fig.  1  for  the  case  where 
q  =  1/2. 


Fig.  I.  U“(P):  q,  =  qj  =  1/2.  8^ 


0  •;  1. 

P 

2.5  FURTHER  PROPERTIES  OF  F" 

Although  the  previous  example  is  rather  special,  many  of  its  characteristics  are  typical  of 
the  more  general  case  where  4^  q^.  In  this  more  general  case,  U°°(P)  is  convex.  If  two  se¬ 
quences  are  optimum  at  the  same  point  P,  this  point  must,  in  general,  be  transformed  into  Pq 
by  both  sequences.  Until  this  transformation  occurs  (for  the  first  time),  both  sequences  must 
be  identical.  In  the  case  where  U°°(P)  is  piecewise  linear  over  (e,  1  —  e),  such  a  point  must  be 
a  breakpoint  where  two  linear  intervals  intersect.  The  optimum  sequences  associated  with  both 
intervals  must  be  optimum  at  this  point.  In  the  previous  example,  P^  was  a  breakpoint  and  it 
was  transformed  into  Pq  by  three  optimum  (and  unsuccessful)  looks  into  box  1.  The  optimum  se¬ 
quences  associated  with  the  linear  intervals  (P^.  P3)  ^nd  (P3.  P^)  were  111  21 21 .  .  .  and 
1111212.  .  .  ,  respectively.  If  P  is  a  breakpoint,  it  transforms  into  another  breakpoint,  whereas 
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1 


I 


I 


=  u“(Po;  2)  =  1  +  [Po  +  (1  -  Pq)  r^l  u”(Po2)  . 

It  follows  that  u”(pQ^)  =  U°°(pQ2)-  The  function  u'*’(P)  is  convex,  and  furthermore,  it  can  be 
shown  that  it  cannot  be  flat  over  the  whole  recurrent  region.  Therefore  P*  must  lie  inside  the 
recurrent  region. 

2.6  PERIODIC  BEHAVIOR  IN  THE  RECURRENT  REGION 

In  the  previous  example  we  found  that  there  were  two  intervals  inside  (Pq^.  which 

U°°(P)  was  linear  and  that  the  optimum  search  sequence  associated  with  each  was  periodic.  This 
was  true  because,  with  the  detection  probabilities  equal,  a  look  into  each  of  the  two  boxes  trans¬ 
formed  P  back  into  itself  and  looks  of  this  form  were  optimum  in  (PqI’  ^02^’  With  the  periodicity 
that  resulted,  P  could  oscillate  during  the  search  process  only  between  two  points,  one  in  each 
of  the  two  intervals.  Finally,  since  P  transformed  back  into  itself  after  two  looks,  it  was  easy 
to  compute  U^IP)  in  closed  form  for  each  of  the  two  intervals. 

In  this  section,  we  shall  study  the  conditions  under  which  P  transforms  into  itself  after  n^ 
looks  have  been  made  into  box  1  and  n^  looks  have  been  made  into  box  2.  We  shall  find  that  when 
these  conditions  hold,  some  ordering  of  looks  into  box  1  and  n^  looks  into  box  2  will  be  opti¬ 
mum  for  each  P  belonging  to  the  recurrent  region.  Furthermore,  over  this  region  U  (P)  will 


if  P  lies  within  a  linear  interval,  the  next  optimum  look  transforms  it  into  the  interior  of  another 
linear  interval.  In  general,  Pq  will  always  be  a  breakpoint;  that  is,  the  derivative  of  U^CP)  has 
a  nonzero  jump  at  this  point.  This  occurs  because  the  optimum  next  look  is  different  on  either 
side  of  Pq.  The  point  P^  is  transformed  into  either  P^^  or  Pq^-  We  shall  soon  see  that  U°°(P) 
is  piecewise  linear  over  any  interval  (e,  i  —  e),  where  e  >0,  if  and  only  if  P^^  and  Pq^  ure 
transformed  back  into  Pq  by  finite  sequences  of  optimum  looks. 

The  behavior  of  P  is  interesting  in  that  once  P  belongs  to  the  interval  (Pg^.  ^02^  remains 
in  this  interval  for  all  time  if  optimum  looks  are  used.  If  P  is  less  than  Pq,  a  look  into  box  2  is 
required  and  P  increases  to  a  new  point  that  cannot  exceed  Pq2-  Similarly,  if  P  is  greater  than 
Pg,  a  look  into  box  1  is  required,  and  although  P  decreases  in  value  it  must  still  be  larger  than 
Pg^.  In  contrast  to  this  behavior,  if  P  lies  outside  (Pg^.  ^02^  will  eventually  transform  into 
it.  If  P  is  greater  than  Pg^  this  transformation  is  accomplished  by  a  sequence  of  looks  into 
box  1.  If  P  is  less  than  Pgj,  a  sequence  of  looks  into  box  2  serves  the  purpose  (the  bounding 
points  P  =  0  and  P  =  1  are  exceptions  because  they  transform  into  themselves).  As  a  result  of 
these  properties,  (Pg^.  ^02^  will  be  called  the  recurrent  region  and  the  interiors  of  (0,  and 

(Pg^,  i)  will  be  called  the  transient  regions. 

The  recurrent  region  is  of  special  interest  for  several  reasons.  First,  as  we  have  seen 
from  our  example,  once  u”(P)  is  known  within  this  region  it  is  not  too  difficult  to  compute 
u'”(P)  for  any  P  lying  outside  it.  In  addition,  U^fP)  attains  its  maximum  somewhere  inside 
this  region.  The  point  P*  at  which  this  occurs  corresponds  to  the  evader's  good  strategy  in  g”  . 
Also,  we  shall  find  that  the  infinite  search  sequences  and  the  associated  payoff  .functions  optimum 
at  P*  are  all  that  are  needed  to  derive  the  searcher's  good  strategy  in  G”. 

The  proof  that  U^iP)  is  a  maximum  inside  (Pq^,  ^02^  quite  simple.  At  Pg  a  look  into 
either  box  is  optimum  and  we  can  write 

U*’(Pg)  =  u”(Pg:  1)  =  1+  (Pgrj  +  1  "  Pq)  U^IPgi) 
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consist  of  linear  segments,  with  in  (Pq,  P02)  ’^2  ^^01’  ^0^'  point  P  will 

transform  into  each  of  these  intervals  over  which  U”’{P)  is  linear  before  it  returns  to  its  starting 
point.  We  shall  see  that  within  each  interval  the  associated  optimum  sequence  is  periodic  with 
period  n^  +  n^  and  that  the  periodic  sequences  for  the  various  intervals  differ  from  one  another 
only  in  phase. 

„  The  condition  under  which  P  transforms  into  itself  after  a  total  of  n^  looks  have  been  made 
into  box  1  and  n*  looks  have  been  made  into  box  2  is  quite  simple.  Recalling  Eq.  (2-3), 


(n^,  n^) 


"l  "2 

Pr^^  +  (1  -P)  r^^ 


we  see  that  this  transformation  occurs  if 


where  n^  and  n^  are  integers,  and  r^  and  are  the  escape  probabilities.  Since  we  are  inter¬ 
ested  in  the  first  return,  n^  and  n^  should  have  no  common  factor.  Equation  (2-9)  is  equivalent 
to  requiring  that 

log(r2) 

"2  ”  log(rj) 

A  pair  of  integers  (n^,  n^)  exists  if  the  ratio  of  the  logarithms  of  the  escape  probabilities  is 

rational.  If  this  ratio  is  irrational  there  still  exist  rational  numbers  that  are  arbitrarily  close 

ni  n? 

to  it,  and  the  case  where  r^  =  r^  is  of  general  interest.  The  only  exception  occurs  when 
r^  and/or  r^  is  equal  to  zero.  This  case  is  quite  simple  to  solve  and  will  be  considered  later. 
The  problem  of  approximating  log  (r-)/log  (r . )  by  an  n,/n,  for  which  n,  +  n,  is  not  too  large  will 

^  ^  *  00  1  n.  j^2 

also  be  deferred.  Here  we  shall  consider  the  behavior  of  F  when  the  equation  r^  =  r^  is 
satisfied  exactly. 

In  order  to  see  how  the  optimum  search  sequences  behave  in  (Pqi'  ^02^’^'^'^®^  condition, 

let  us  define  an  ordering  of  probabilities  P_n2'  ^-n2+l’  ’  ’  ’  P-1,  Pq,  Pl'---Pni-1'  ^ni’  Here, 

Pg  has  its  usual  meaning.  For  the  moment,  .we  shall  only  require  in  addition  that  P_n  =  Pci, 

P  =  P„,,  and  that  P.  <  P.  if  i  <  j: 

"j  02-  1  j  J 


-nj  -02+1 


’’-I  Po  Pi 


If  Pj  <  Py  a  look  into  box  2  will  transform  to  a  point  which  is  less  than  that  into  which  P^  is 
transformed. 

Making  use  of  similar  considerations,  we  can  define  the  set  {Pj}  by  the  following  relations. 
Given  Pj,  a  look  into  box  1  transforms  it  n^  points  to  the  left.  Similarly,  given  a  look  into  box  2, 
P  shifts  n^  points  to  the  right.  Therefore,  we  can  write 


(k,,  k,) 

p. _ t _ £_  P 

J  j-k^n2+k2n^ 
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and  we  see  that 


("I'V 


If  Pj  is  greater  than  Pq,  the  searcher's  strategy  calls  for  a  look  into  box  1,  and  if  Pj  is  less 
than  Pq  he  should  look  into  box  2.  Let  us  consider  a  P^  that  is  unequal  to  Pq^.  Pq-  ^02'  ^ 

little  thought  will  show  that  since  n^  and  have  no  common  factor,  P^  will  eventually  transform 
into  Pq  as  a  result  of  an  optimum  sequence.  At  this  point  the  searcher  can  look  into  either  box 


and  P  can  transform  into  Pq^  or  Pq2- 


In  either  case,  the  next  look  transforms  P  into  P, 


and  after  a  total  of  n^  looks  into  box  1  and  n^  looks  into  box  2,  P  returns  to  Pj. 


"ni-na' 

Therefore, 


associated  with  each  Pj  are  two  optimum  periodic  sequences.  These  two  sequences  differ  only 
in  the  order  in  which  the  two  boxes  are  examined  after  P^  has  transformed  into  Pq. 

In  order  to  calculate  the  actual  value  of  an  arbitrary  P^,  we  must  note  the  number  of  looks 
into  the  two  boxes  (k^,  k^)  which  transforms  it  into  Pq.  Then, 


p  p  - 

j  0  ■  k  k 

P.r^  *  +  (1  -  P.)  r^ 


A  simple  manipulation  reveals  that 


(kp  k^) 


k  k. 


(2-10) 


The  entire  set  {Pj}  can  be  evaluated  in  this  manner. 

As  has  been  mentioned,  there  are  two  optimum  periodic  sequences  associated  with  each  Pj. 

If  we  consider  a  P  that  lies  just  to  the  right  of  P^  we  see  that  either  sequence  will  transform  P 
into  the  interval  (Pq,  P^)  as  Pj  transforms  into  Pq.  A  look  into  box  1  is  required  next,  and  the 
sequence  associated  with  P^  that  calls  for  this  look  will  eventually  transform  P  back  into  itself. 
Working  through  the  same  argument  when  P  lies  just  to  the  left  of  we  can  conclude  that 

the  optimum  periodic  sequence  common  to  P^  and  Pj.(.i  is  the  unique  optimum  sequence  for  all  P 
inside  (P^,  P^^^).  The  function  U°°(P)  consists  of  a  single  linear  segment  over  the  interval 
(Pj,  Pj,),^)'  The  associated  optimum  sequence  is  periodic,  with  period  n^  +  n^.and  one  period  of 
this  sequence  transforms  P  into  each  of  the  other  intervals  in  (Pq^.  ^02^  before  transforming 
back  into  itself.  Hence,  the  sequences  associated  with  each  interval  are  identical  except  in  phase. 


2.6.1  Example  Where  =  Pg  :  Calculation  of  U“  (P) 


4  3 

Let  us  examine  the  case  where  r^  “  ^2  •  This  will  not  only  clarify  the  previous  discussion, 
but  will  provide  a  device  for  showing  how  u“’(P)  can  be  calculated  when  the  optimum  search  se¬ 
quences  are  periodic  inside  the  recurrent  region.  Ordering  the  breakpoints  Pq^  =  P  2.  P  2> 


P  P 
0'  ^1- 


.  .  .  P,j  =  Pq2  til®  real  line,  we  can  designate  each  interval  that  results  by  tTj  as 


indicated  below. 
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In  order  to  determine  the  periodic  sequences  associated  with  each  interval  it  is  convenient 
to  draw  a  chain  diagram  showing  the  manner  in  which  these  intervals  transform  into  each  other. 
Each  state  in  the  chain  represents  an  interval  tt..  Starting  with  tt^,  a  look  into  box  1  shifts  P 
three  (n^)  intervals  to  the  left  to  tt  at  which  point  a  look  into  box  2  transforms  it  four  (n^) 

segments  to  the  right  to  If  the  process  is  continued  in  this  manner,  the  following  diagram 

results.  It  is  clear  that  one  period  of  each  optimum  sequence  involves  n^  looks  into  box  1  and 


n^  looks  into  box  2,  and  that  these  sequences  differ  only  in  phase.  The  sequence  associated  with 
s^  is  1212112,  1212112,  .  .  .  and  so  forth.  In  order  to  calculate  the  breakpoints  it  is  convenient 
to  note  that  P[j~  Pg  as  Sj^— *  s  when  k  is  positive,  and  that  P_]^— *  Pg  as  s_j^— *  s^.  For  exam¬ 
ple,  P^  or  <3.  2).  p^,  so  that 

P  r  ^ 

P  =  0  ^ 

^  Pgr^^  +  (1  -  Pg)  rl 

Once  we  have  found  the  optimum  sequence  associated  with  an  interval  over  which  U°°(P)  is 
linear,  we  are  in  a  position  to  calculate  the  payoff  over  this  interval.  This  calculation  is  fairly 
straightforward.  However,  the  general  techniques  that  can  be  employed  will  be  considered  in 
some  detail,  since  a  thorough  understanding  of  them  is  necessary  when  the  more  general  game 
G,  in  which  moving  is  allowed,  is  studied  in  Chapter  4. 

The  general  approach  that  we  shall  use  is  to  calculate  the  payoff  which  results  when  a  partic¬ 
ular  search  sequence  is  used.  Associated  with  each  state  s.  in  a  chain  diagram  is  a  unique  in¬ 
finite  sequence.  We  shall  let  U°°(P)  represent  the  payoff  that  results  when  this  sequence  is  used. 
The  payoff  associated  with  Sj^  is,  therefore,  Uj°°(P)  and  it  is  valid  for  all  P  belonging  to  the  inter¬ 
val  (0,  1).  If  the  State  Sj  generates  a  sequence  optimum  over  the  interval  tTj,  then  U.°°(P)  =  u”(P) 
over  this  interval. 

Usually,  the  state  s^  will  be  assumed  to  generate  a  sequence  that  is  optimum  over  tt^.  Thus, 
we  shall  usually  assume  that  U^CP)  =  U  (P)  when  P  belongs  to  tt^. 

At  times,  however,  the  payoff  of  an  approximately  optimum  search  sequence  will  be  con¬ 
sidered.  In  this  situation,  u”{P)  will  be  the  exact  payoff  associated  with  Sj.  The  approximation 
that  u”(P)  =  U°°(P)  over  ir.  results  from  associating  with  this  interval  the  approximately  optimum 
search  state  Sj. 

The  payoff  U°°(P)  is  defined  for  a  fixed  sequence.  This  payoff,  therefore,  must  be  linear  in 
P  and  we  can  express  it  in  the  form 

U”  (P)  =  ajP  +  bj(l  -  P)  . 
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Perhaps  this  is  not  the  most  obvious  form  in  which  the  function  could  be  expressed,  but  it  wi,ll 
prove  to  be  the  most  convenient.  Although  the  evader  may  have  chosen  to  hide  in  box  1  with 
probability  P,  once  he  has  made  this  choice  he  knows  exactly  where  he  is.  The  quantity  a.  is 
his  expected  payoff  if  he  is  actually  in  box  1,  and  b^  is  his  expected  payoff  if  he  is  actually  in 
box  2  when  the  searcher  uses  the  sequence  of  s^. 

When  this  formulation  is  used,  the  manner  in  which  the  payoff  of  one  state  is  related  to  that 

of  the  state  to  which  it  is  connected  by  the  next  look  becomes  quite  simple.  If  s. - ►  s^,  we 

may  use  Eq.  (2-4)  to  write 

U“(P)  =  a.P  +  b,(l  -  P)  =  1  +  (Pr,  +  1  -  P)  U.”  (p^--Tl-3p)  ' 

which  shows  that 


a.  =  1  +  r.a. 
1  1  3 


b.  =  1  +  b.  .  (2-11) 
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Similarly,  we  find  that 


a.  =  1  +  a. 
1  3 


b.  =  1  +  r,b.  .  (2-12) 

1  ^3 

If  we  wish  to  express  Uj”(P)  in  terms  of  a  uj”(P)  that  follows  after  several  looks,  we  may 
simply  compound  the  above  equations.  However,  it  will  prove  to  be  more  convenient  to  use  an 
alternative  formulation.  In  this  formulation  we  can  consider  a  complete  set  of  mutually  exclu¬ 


sive  events,  multiply  the  reward  associated  with  each  by  the  probability  of  the  event,  and  sum. 
1211 

For  example,  if  s. - -  s^  and  the  evader  is  in  box  1,  he  can  be  found  on  the  first,  third,  or 

fourth  look  or  he  can  survive  all  of  them.  The  rewards  associated  with  these  events  are  1,  3, 


4  and  4  +  a.,  respectively,  and  the  associated  probabilities  of  these  events  are  q^,  q^r^,  q^r^ 


We  can,  therefore,  write 


In  order  to  write  general  expressions  of  this  form,  consider  the  set  where 

represents  the  look  on  which  box  m  is  examined  for  the  n^^  time.  In  the  above  case,  t^(l)  =  1, 
t^(2)  =  3,  t^(3)  =  4  and  t2(l)  =  2.  If  s.  is  transformed  into  Sj  by  a  sequence  defined  by  {t^  (n)} 
that  involves  a  total  of  k^  looks  into  box  1  and  k^  looks  into  box  2,  then 


bi  =  +4r^  +(>r^)  +7r^] 

Once  u”(P)  is  known,  the  payoffs  for  the  other  states  can  be  computed  in  order  around  the 
chain  by  means  of  Eqs.  (2-11)  and  (2-12).  If  only  a  particular  payoff  is  desired,  Eq.  (2-13)  may 
be  used. 

In  Fig.  2  the  payoff  over  the  recurrent  region  is  graphed  for  the  case  where  r.  =  0.512  and 

4  3  12  ^ 

r-  =  0.4096,  i.e.,  where  r.  =  r,  =  x  and  x  =  0.8. 

2  12 

These  equations  can  also  be  used  to  compute  the  payoffs  in  the  transient  regions.  However, 
the  linear  intervals  approach  zero  in  length  as  P  approaches  zero  or  one,  and  one  should  not 
attempt  to  calculate  every  payoff. 

Equation  (4-13)  can  be  used  lo  calculate  the  payoff  for  an  arbitrary  sequence  even  if  it  is 
not  periodic.  In  this  case,  the  set  must  be  used  to  represent  the  entire  infinite  sequence 

and 
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n=l 


b  =  E  ^2^  . 

n=l 

If  the  payoff  is  known  for  any  fixed  sequence  as  a  function  of  P,  Eq.  (2-13)  may  be  used  to  com¬ 
pute  the  payoff  that  results  if  this  sequence  is  preceded  by  an  arbitrary  finite  sequence. 

2.6.2  Applicability  of  the  Periodic  Case 

It  should  be  clear  that  for  any  pair  of  escape  probabilities  (r^,  r^)  where  both  are  unequal 
to  one,  a  pair  of  integers  (n^,  n^)  can  be  chosen  which  makes  n^/n2  arbitrarily  close  to 
logr^/logr^.  The  choice  of  n^  and  depends  upon  the  accuracy  desired  in  the  approximation. 
The  choice  that  is  made  will  determine  the  sequences  used  and  the  payoff  that  results. 

Even  when  n^  +  n^  is  fairly  small,  the  approximation  can  be  fairly  good  if  n^  and  n^  are 
well  chosen.  Associated  with  each  search  state  Sj  is  an  interval  ;r^.  The  breakpoints  that  define 
the  boundaries  of  each  such  interval  are  those  points  that  transform  into  Pq  sometime  during  the 
first  n^  +  n^  —  1  looks  of  the  actual  optimum  sequence  (P_j^2  ^n^  should  be  calculated  in  this 

manner  even  though  they  will  be  only  approximately  equal  to  P^j  and  Pq^,  respectively).  As  a 
result,  the  first  period  of  the  sequence  associated  with  Sj,  will  be  optimum  for  any  P  belonging 
to  TT..  During  the  next  few  periods,  errors  can  be  made  only  when  P  is  close  to  P^,  and  their 
effect  will  be  small.  With  increasing  time,  the  errors  become  more  frequent  and  more  serious. 
The  effect  of  these  larger  errors  is  mitigated,  however,  by  the  decreasing  probability  that  the 
game  lasts  long  enough  for  them  to  be  made. 

Perhaps  the  most  intriguing  aspect  of  an  approximation  of  this  type  concerns  the  appearance 
of  U°°(P).  Given  a  choice  (n^,  n^)  the  resulting  payoff  in  (P  -"2'  ^"1^  =  (Pq^,  ^02^  will  consist  of 
n^  +  linear  segments.  Since  the  associated  sequences  are  only  approximately  optimum,  we  can¬ 
not  expect  the  function  to  be  exactly  continuous  at  the  breakpoints.  The  interesting  consideration. 
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however,  concerns  the  piecewise  linear  appearance  of  this  function  compared  with  that  of  the 
optimum  payoff  function.  If  logr^/logr^  is  irrational,  or  if  it  is  equal  to  where 

n|  +  n^  »  n^  +  n^,  the  optimum  payoff  function  will  have  many  more  linear  segments  and  divid¬ 
ing  breakpoints.  A  breakpoint  occurs  at  a  point  that  transforms  into  Pq,  and  the  sequences  as¬ 
sociated  with  the  interval  on  either  side  will  agree  up  to  the  time  when  this  occurs.  Breakpoints 
that  transform  quickly  into  Pg  will  be  much  more  apparent  than  those  that  transform  into  Pg  later 
on.  If  the  integers  (n^,  n^)  are  well  chosen,  the  important  breakpoints  will  appear  in  the  payoff 
function  of  the  approximating  strategy. 

2.7  f"  when  qj  AND/OR  q2  IS  EQUAL  TO  ONE 

When  the  detection  probability  of  at  least  one  box  is  equal  to  one,  the  optimum  search  strat¬ 
egy  can  never  consist  of  a  periodic  sequence  with  looks  into  both  boxes.  Once  a  box  with  unity 
detection  probability  has  been  examined,  it  should  never  be  examined  again.  If  ~  ^2  ~ 
search  can  last  for  at  most  two  looks,  whereas  if  q  =  1  for  only  one  box,  an  optimum  sequence 
can  call  for  at  most  one  look  into  it.  In  either  case,  the  game  is  fairly  easy  to  solve. 

When  both  detection  probabilities  are  equal  to  one,  the  game  is  trivial.  If  P  is  greater 
than  one-half,  the  optimum  search  strategy  calls  for  a  look  into  box  1,  followed  by  a  look  into 
box  2  if  necessary.  When  P  is  less  than  one-half,  the  reverse  is  true.  Clearly,  U°°(P)  = 

P  +  2(1  -  P)  for  P  $.1/2,  and  U^CP)  =  2P  +  (1  -  P)  for  P^l/2. 

When  q  =  1  for  only  one  box,  the  optimum  search  strategy  is  almost  as  simple.  Let  us  con¬ 
sider  the  case  where  q^  =  1  1-  If  P  is  less  than  Pg,  box  2  should  be  examined  first. 

P  then  becomes  equal  to  one  and  all  the  future  looks  should  be  made  into  box  1.  Since  U°°(l)  = 
l/q^,  we  find  that  U°°(P)  =  P[1  +  (l/q,j)J  +  (1  —  P)  when  P  Pg  =  l/(l  +  q^).  and  the  payoff  func¬ 
tion  consists  of  only  one  linear  segment  over  (0,  Pg). 

Over  the  interval  (Pg,  1),  the  payoff  function  consists  of  many  segments.  In  fact,  there  are 
an  infinite  number  of  them,  since  as  P  approaches  one  they  become  arbitrarily  short.  For  all 
P  in  a  given  interval  over  which  u'”(P)  is  linear,  the  same  number  of  looks  into  box  1  is  re¬ 
quired  before  a  look  into  box  2  is  called  for.  We  can  define  each  breakpoint  Pj^  by  Pj^  Pg, 

and  designate  as  the  interval  (Pj^  Pj^).  If  P  belongs  to  then  k  looks  into  box  1  trans¬ 
forms  it  to  the  left  of  Pg.  Using  Eq.  (4-13)  we  find,  after  a  simple  manipulation,  that 


(P)  =  PI 


+  (1  -  P)  (k  +  1) 


where 


(i+qjr]"-^  ’  l+q^rj") 


The  function  that  applies  when  q^  =  l/2  is  graphed  in  Fig.  3. 

It  is  worthwhile  to  note  that  there  are  dangers  inherent  in  assuming  that  a  detection  proba¬ 
bility  is  equal  to  one.  Usually,  if  a  search  strategy  that  is  optimum  for  one  pair  of  detection 
probabilities  is  used  for  a  slightly  different  pair,  the  payoff  will  be  almost  optimum.  However, 
if  one  of  the  q's  is  assumed  to  equal  one  when  it  is  only  close  to  one  this  is  no  longer  so.  In  this 
case,  the  searcher  will  look  only  once  into  that  box.  In  the  unlikely  but  possible  event  that  the 
evader  is  in  that  box  and  escapes  detection,  he  will  receive  an  infinite  payoff.  This  is  clearly 
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Fig.  3.  U“(P):  q,  =  1/2;  q2  =  1 . 


undesirable.  If  both  boxes  are  assumed  to  have  unity  detection  probabilities  the  danger  is  not 
quite  so  great.  If  the  evader  is  not  found  after  the  first  two  looks,  the  fallacy  is  revealed. 

It  should  be  mentioned,  on  the  other  hand,  that  u“(P).  the  payoff  that  applies  when  the  actual 
optimum  search  strategy  is  used,  is  well  behaved  as  one  or  both  of  the  q's  approach  one.  There¬ 
fore.  if  one  wishes  to  get  a  rough  idea  of  what  the  payoff  is  when  is  close  to  one.  the  qj  =  1 
solution  is  valid  as  an  approximation. 

2.8  SOLUTION  OF  G" 

Now  that  we  have  considered  the  modified  game,  we  are  in  a  position  to  find  the  good  strat¬ 
egies  and  value  of  G  where  the  evader  is  not  required  to  reveal  his  strategy  to  the  searcher. 

In  F  .a  great  deal  of  emphasis  was  placed  on  deriving  the  searcher's  optimum  strategy  and  the 
resulting  payoff  as  a  function  of  P.  The  evader's  optirrium  strategy  was  scarcely  mentioned  be¬ 
cause  it  was  so  simple  -  he  should  select  the  P  at  which  U”(P)  is  a  maximum,  i.e..  P*  where 
U  (P*)  E  V".  We  now  need  to  find  the  strategy  for  the  searcher  that  limits  the  evader  to  this 
amount  when  P  is  unknown.  This  will  be  the  searcher's  good  strategy,  and  it  will  imply  that  P’*' 
is  the  evader's  and  that  V°°  is  the  value. 

The  searcher's  good  strategy  is  easily  derived.  We  can  usually  expect  that  U”(P)  will  be  a 
maximum  at  a  unique  point  and  therefore  that  the  segments  on  either  side  will  have  slopes  of  op¬ 
posite  sign.  Let  the  associated  intervals  be  designated  by  tt.  and  tt..  If  the  searcher  chooses  to 
use  the  sequence  associated  with  s.  with  probability  y^^  and  that  associated  with  s.  with  probability 
Yy  where  y-  +  y^  =  1.  the  payoff  will  be 

U” (P;  yj.  y^)  =  y^U ” (P)  +  y^U ” (P)  . 

P  can  take  any  value  between  zero  and  one  and  is  unknown  to  the  searcher.  Both  U.”(P)  and 
Uj  (P)  are  equal  to  v"  at  P*  and  have  slopes  (with  respect  to  P)  of  opposite  sign.  It  follows 
that  there  exists  a  probability  distribution  (y^.  yj)  that  yields  a  payoff  equal  to  v“’  for  all  P. 
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CHAPTER  3 

G":  SEARCH  EVASION  GAME  WITH  ZERO  MOVING  COST 

3.1  INTRODUCTION 

In  this  chapter  let  us  consider  the  other  limiting  form  of  our  search  evasion  game  —  the 
game  G°  in  which  the  evader  can  change  boxes  after  each  look  with  zero  cost.  Since  p,  the 
moving  cost,  is  equal  to  zero,  the  payoff  from  the  searcher  to  the  evader  is  again  simply  equal 
to  the  expected  number  of  looks  required  to  find  the  evader.  Here,  however,  the  evader  plays 
an  active  role  throughout  the  game.  We  should  expect  that,  because  of  this  additional  freedom, 
he  can  guarantee  himself  a  payoff  larger  than  v”. 

If  we  assume  that  the  evader  is  playing  against  an  intelligent  searcher,  we  should  expect  him 
to  make  his  moves  in  a  judicious  manner.  The  decision  rule  for  making  these  moves  should  be 
formulated  so  as  to  accomplish  a  specific  purpose  —  to  maximize  the  guaranteed  payoff.  The 
evader  should  not  move  according  to  whim  but  only  according  to  a  carefully  defined  rule.  Since 
the  characteristics  of  such  a  decision  rule  will  be  common  to  the  more  general  game  where 
moves  must  be  paid  for.  let  us  examine  them  in  some  detail. 

3.2  A  PROPERTY  OF  EVASION  STRATEGIES;  EVADER'S  GOOD  STRATEGY  IN  G° 

In  general,  a  moving  strategy,  or  behavioral  strategy,  for  the  evader  should  specify  as  a 
function  of  past  play  a  probability  distribution  for  moving  before  the  next  look.  Thus,  at  some 
point  in  the  play,  the  evader  may  decide  to  move  to  box  1  if  in  box  2  with  probability  x^  and  to 
move  to  box  2  if  in  box  i  with  probability  x^.  In  order  to  see  what  the  effect  of  such  a  rule  is, 
let  us  assume  that  we  as  observers  know  the  past  search  sequence  and  all  such  decision  rules 
that  have  been  used  up  to  this  point.  Given  this  information,  we  can  calculate  P,  the  probability 
that  the  evader  is  in  box  1.  By  exercising  the  above  decision  rule,  the  evader  transforms  P 
into  a  new  value,  P'.  Clearly, 

P'  =  P(1  -  x^)  +  (1  -  P)  x^ 

and 

1  -  P'  =  Px^  +  (1  -  P)  (1  -  x^) 

where  only  one  of  these  equations  is  necessary.  A  result  of  such  a  strategy  is,  therefore,  the 
transformation  of  the  state  variable  P  into  P'. 

In  fact,  this  transformation  is  the  only  effect  that  the  moving  strategy  has  on  the  future  be¬ 
havior  of  the  game.  In  order  to  show  this,  let  us  assume  that  the  evader  is  required  to  reveal 
each  decision  rule  of  the  above  form  to  the  searcher  when  it  is  exercised  (note  that  we  are  not 
requiring  him  to  reveal  his  complete  strategy  all  at  once).  Such  a  revelation  will  not  hurt  the 
evader  if  he  is  using  his  good  strategy  and  the  searcher  is  intelligent  enough  to  use  his.  On  the 
other  hand,  the  evader  will  certainly  suffer  a  disadvantage  if  he  uses  a  poor  strategy,  since  the 
searcher  can  use  the  value  of  P'  to  determine  where  he  should  look  next.  This  is  the  only  man¬ 
ner  in  which  the  searcher  can  make  use  of  this  knowledge,  and  it  follows  that  the  only  purpose 
of  a  moving  strategy  is  this  transformation. 

We  noted  earlier  that  a  moving  strategy  (x^,  x^)  could  be  a  function  of  the  entire  previous 
play.  However,  given  x^  and  x^,  the  a  posteriori  P'  is  a  function  of  the  a  priori  P  alone  and 
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not  of  the  entire  past  play  in  all  its  detail.  Therefore,  the  evader's  good  strategy  must  belong 
to  the  class  of  behavioral  strategies  in  which  the  move  probabilities  and  are  functions  of 
the  a  priori  P.  These  arguments  apply  equally  well  to  the  general  game  where  p  ^  0. 

A  strategy  belonging  to  this  class  is  completely  defined  by  the  functions  x^(P)  and  x^CP)  and 
the  initial  P  that  the  evader  uses  when  he  first  hides.  The  influence  of  the  functions  x^(P)  and 
x^CP)  on  the  behavior  of  the  game  may  be  described  completely  by  the  mapping  of  P  that  they 
produce.  We  shall  determine  the  evader's  good  strategy  in  terms  of  this  mapping.  We  can  ex¬ 
pect  the  mapping  associated  with  the  good  strategy  to  be  unique.  For  a  given  mapping,  however, 
the  functions  x^(P)  and  X2(P)  are  not  unique  but  have  one  degree  of  freedom.  Therefore,  the 
evader's  good  strategy  will  not  be  unique  so  far  as  these  functions  are  concerned.  In  the  next 
chapter,  we  shall  see  that  when  p  ^  0,  the  cost  associated  with  a  transformation  of  P  depends 
upon  the  particular  functions  used.  The  functions  that  produce  a  given  mapping  at  minimum  cost 
are  unique. 

The  evader's  good  strategy  in  G°  is  easily  derived.  In  a  manner  analogous  to  that  used  in 
the  last  chapter,  let  us  consider  the  modified  game  F"  in  which  the  evader  must  tell  the  searcher 
the  value  of  P  that  applies  before  each  look.  In  this  game,  the  evader  is  completely  free  to 
change  P  after  each  look  and  therefore  needs  to  consider  only  the  effect  that  P  has  on  the  next 
look.  If  a  given  P  is  optimum  before  one  look  it  should  be  used  before  each  look.  Therefore, 
after  each  look  the  evader  should  return  P  to  its  original  position  if  it  is  optimum  and  we  may 
write 

U“(P;  1)  =  i  +  [Pr^  +  1  -  P]  U“(P) 

U°(P)  =  min 

U'CP;  2)  =  1  +  fP  +  (1  -  P)  r^]  U“(P) 

If  a  given  look  is  optimum  once,  it  is  always  optimum.  Therefore, 

’u"(P;i)  =  ^ 

^qi 

U'>(P)  =  min 

The  optimum  P  is  that  which  maximizes  U°(P)  and  is  Pq  =  q^/Cq^  +  q^)-  Thus,  the  evader  hides 
in  a  manner  that  causes  the  probability  of  detection  on  each  look  to  be  independent  of  where  the 
look  is  made.  Using  this  optimum  strategy,  the  evader  guarantees  a  payoff  in  F°  of  (l/q^)  + 
(l/q^). 

The  functions  Xj(P)  and  x^iP)  that  achieve  the  optimum  mapping  are  defined  by  the  equation 

Po  =  =pri-x2(p)]  +  (i-p)xj(P)  , 

where  we  must  require  that  0  ^  i.  As  long  as  the  evader  uses  a  good  strategy  through¬ 

out  play,  his  strategy  may  be  more  simply  defined.  Given  a  look  into  box  1, 

p  -  ^2  1  p  ^  ^2  ~  ^1^2 

0  ■  °1  +^2  'll  +'l2-'ll'l2 

Similarly, 
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0 


q<  +  q,  -  q4q 


Therefore,  after  each  look  into  box  1,  the  evader  should  select  and  x^  which  satisfy 

qP^  "  +  ^2  -  ‘ll‘J2  ' 


and  after  each  look  into  box  2  he  should  select  those  which  satisfy 

q2  _  ^2^^  ~^2>  +  (qi  -qiq2> 
qi  +  q2  ■  qi  +  q2  -  qiq2 

Again,  x^  equals  the  probability  of  moving  to  box  1  if  the  evader  is  in  box  2  and  x^  is  the  proba¬ 
bility  of  moving  to  box  2  if  he  is  in  box  1 . 

In  the  next  section  we  shall  show  that  the  searcher  can  limit  the  evader  to  the  payoff  (l/q^)  + 
(l/q^).  This  will  prove  that  the  above  optimum  strategy  in  F°  is  the  evader's  good  strategy  in 
G°  and  that  V"  =  (l/q^)  +  (l/q^)- 


3.3  SEARCHER'S  GOOD  STRATEGY 

The  searcher's  good  strategy  is  also  easily  derived.  In  G°°  the  searcher's  good  strategy 
required  each  look  to  be  a  function  of  the  past  search  sequence.  He  made  a  random  choice  be¬ 
tween  two  infinite  search  sequences  and  after  this  choice  was  made  all  looks  were  specified  de¬ 
terministically.  In  G°,  we  find  the  opposite  extreme.  The  evader  can  change  P  at  will,  and  the 
searcher  cannot  make  any  use  of  his  past  sequence  in  choosing  his  next  look.  Therefore,  we 
may  consider  only  the  class  of  behavioral  search  strategies  in  which  each  look  is  made  independ¬ 
ently  of  the  others;  that  is,  where  box  1  is  examined  with  probability  Y  and  box  2  is  examined 
with  probability  1  —  Y. 

It  is  convenient  to  turn  the  tables  on  the  searcher  and  define  the  modified  game  H°  in  which 
he  must  reveal  his  probability  distribution  to  the  evader.  Letting  W°(Y)  equal  the  payoff  that 
results  when  the  evader  uses  an  optimum  strategy,  we  find  that 


W'’(Y)  =  max 


W°(Y;  1)  =  1  +  (Yr^  +  (1  -  Y)]  W°(Y) 
WCY;  2)  =  1  +  (Y  +  (1  -  Y)  r^]  W'(Y) 


where  W'’(Y;  i)  is  the  payoff  if  the  evader  hides  in  box  i.  Clearly,  if  box  i  is  optimum  once  it  is 
always  optimum.  Therefore. 


W”(Y;  1) 


W°(Y)  =  max 


W°(Y;  2) 


1 

Yqi 

-  Y)  q^ 


The  searcher's  optimum  strategy  minimizes  W”(Y),  and  we  find  that  he  should  look  into  box  1 
with  probability  Y  =  q2/(q^  +  q^)  ^^d  into  box  2  with  probability  1  —  Y  =  q^/(q^  +  q^)- 

This  strategy  causes  the  probability  of  detection  on  any  look  to  be  independent  of  where  the 
evader  hides.  It  limits  the  evader  to  the  payoff  (l/q^)  +  (l/q^),  and  is,  therefore,  the  searcher's 
good  strategy  in  G°. 
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It  is  interesting  to  note  that  in  this  game  the  payoff  is  equal  to  V°  if  only  one  player  uses 
his  good  strategy.  In  G°°,  the  payoff  is  always  equal  to  V°°  if  just  the  searcher  uses  his  good 
strategy.  If  the  evader  uses  his  good  strategy  but  the  searcher  uses  a  poor  strategy,  however, 
the  payoff  can  be  greater  than  the  value. 


CHAPTER  4 

SEARCH  EVASION  GAME  WITH  M  ^  0:  EVADER’S  STRATEGY 


4.1  INTRODUCTION 

In  the  last  two  chapters  we  have  considered  the  two  limiting  forms,  G°  and  G°°,  of  the  two- 
box  search  evasion  game.  In  G°the  evader  was  completely  free  to  move,  and  we  found  that  after 
each  look  he  should  return  the  state  variable  P  to  the  v-alue  that  minimized  the  probability  of  de¬ 
tection  on  the  next  look.  The  game  degenerated  to  a  sequence  of  move-look  pairs,  each  of  which 
was  independent  of  the  previous  ones  (except  that  the  game  stopped  once  the  evader  was  found). 

In  G°°  the  opposite  occurred.  Once  the  evader  hid,  P  became  a  function  of  the  search  process 
only.  Therefore,  the  evader  had  to  consider  the  influence  of  the  whole  search  process  on  his 
choice  of  P.  The  searcher's  good  strategy  became  deterministic  once  an  initial  random  selection 
was  made  from  two  infinite  sequences.  G°°  was  considered  a  limiting  form  of  G  because  it  ap¬ 
peared  plausible  to  assume  that  the  evader  would  never  choose  to  move  if  the  moving  cost  became 
infinite.  We  shall  find  that  this  is  indeed  true.  In  fact,  we  shall  usually  find  that  the  evader 
should  never  move  (even  if  the  searcher  is  aware  of  this)  as  long  as  [i  is  larger  than  some  finite 
Up.  That  is,  when  p  is  greater  than  we  may  consider  the  moving  cost  prohibitive  since  the 
increase  in  search  time  achieved  by  moving  is  more  than  offset  by  the  cost  of  moving. 

In  game  G,  when  the  moving  cost  is  neither  prohibitive  nor  zero  we  shall  find  character¬ 
istics  intermediate  between  those  of  G'and  G°° .  The  evader  receives  one  unit  each  time  the 
searcher  examines  a  box  but  must  pay  p  units  each  time  he  moves.  Therefore,  he  must  balance 
the  increase  in  search  time  afforded  by  moving  against  the  cost  of  moving.  Also,  we  shall  find 
that  the  searcher  can  make  use  of  his  past  search  sequence  in  determining  where  to  look  next 
but  must  make  some  random  decisions  throughout  play.  The  good  strategies  for  the  two  players, 
as  would  be  expected,  change  in  a  well  behaved  manner  from  those  associated  with  G“  to  those 

OO  I 

associated  with  G  as  p  increases.  Furthermore,  the  value  of  the  game  decreases  monotoni- 
cally  as  p  increases  from  zero  to  Pp.  Once  p  is  greater  than  Pp,  the  value  is  independent  of  p 
and  equal  to  v”  because  the  evader  never  incurs  a  moving  charge. 

In  this  chapter  we  shall  develop  the  evader's  good  strategy.  This  will  be  accomplished  by 
using  the  device  that  we  used  in  the  previous  chapters  —  the  modified  game  F  in  which  the  evader 
must  reveal  part  of  his  strategy  to  the  searcher.  Many  of  the  properties  and  techniques  de-  / 
veloped  in  studying  F°°  and  F°  will  be  useful  here.  As  before,  we  shall  proceed  on  faith  and  as¬ 
sume  that  the  evader's  optimum  strategy  in  the  modified  game  will  be  his  good  strategy  in  G. 

This  faith  will  be  justified,  for  we  shall  find  in  the  next  chapter  that  the  searcher  can  indeed 
limit  the  evader  in  G  to  a  payoff  equal  to  that  which  the  evader  can  guarantee  himself  in  F.  Also, 
as  in  the  no-move  game,  we  shall  find  that  the  optimum  search  strategies  and  the  associated 
payoff  functions  developed  in  F  will  be  of  use  when  the  searcher's  good  strategy  in  G  is  con¬ 
sidered. 

4.2  SOME  RESTRICTIONS  ON  THE  EVADER'S  GOOD  STRATEGY: 

EFFICIENT  MOVE  CONDITION 

In  the  last  chapter,  we  saw  that  the  influence  of  the  evader's  strategy  on  the  behavior  of  the 
game  could  be  completely  characterized  by  the  manner  in  which  it  transformed  the  state  variable 
P  between  looks.  It  followed  that  the  evader's  good  strategy  must  belong  to  a  class  of  behavioral 
strategies  in  which  the  probability  of  moving  is  a  function  of  P  and  the  box  in  which  the  evader 
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finds  himself.  As  a  result,  we  were  able  to  characterize  the  evader's  good  strategy  in  terms  of 
a  mapping  and  an  initial  P  used  at  the  start  of  the  game. 

When  p  is  unequal  to  zero,  these  properties  still  hold.  As  before,  once  the  evader  has  ex¬ 
ercised  a  moving  strategy,  the  future  payoff  can  be  characterized  by  the  a  posteriori  value  of  the 
state  variable  and  the  strategies  that  the  players  use  in  the  future.  Now,  however,  a  cost  is  as¬ 
sociated  with  a  transformation  of  the  state  variable,  and  it  is  clear  that  if  a  given  transformation 
of  P  into  P'  is  desired,  it  should  be  achieved  at  minimum  cost.  This  condition  causes  the  move 
probabilities  x^(P)  and  X2{P)  associated  with  a  given  mapping  to  be  unique  and  allows  us  to  asso¬ 
ciate  with  any  transformation  a  unique  cost  function  C(P  —  P'). 

The  move  probabilities  that  achieve  a  given  transformation  at  minimum  cost  are  easily  de¬ 
rived.  If  the  desired  transformation  is  P  -»  P',  then  x^  and  must  satisfy  the  equation 

P'  =  P(i  -  x^)  +  (1  -  P)  x^ 

The  probability  that  a  move  occurs  is  equal  to  +  (1  —  P)  x^  and  the  cost  of  the  transformation 

is  simply  pfPx^  +  (i  —  P)  x^  ].  The  quantities  x^  and  x^  must  minimize  this  cost,  subject  to  the 

usual  restriction  that  0  .g:  x, ,  x,  1,  and  we  find  that 

1  z 


pi  <  p  : 


=  0 


pi  >  P 


pi  _  p 

1  -  P 


x,  =  0 


(4-1) 


This  implies  that  the  evader  should  never  move  from  a  box  unless  he  wishes  to  decrease  the 
probability  that  he  is  there.  The  cost  of  the  transformation,  which  is  now  a  minimum,  is  equal 
to  |i(P  —  P')  if  P'  <  P  and  h-(P'  —  P)  if  P'  ^  P-  We  may  therefore  write 

C(P  -  P')  =  pIP-  P’I  .  (4-2) 

With  the  evader  limited  to  strategies  belonging  to  the  class  of  efficient -move  behavioral 
strategies  defined  above,  we  are  in  a  position  to  develop  the  functional  equations  from  which  the 
good  strategy  can  be  computed. 


4.3  MODIHED  GAMES  F  AND  F':  ASSOCIATED  PAYOFF  FUNCTIONS 

In  the  last  chapter,  the  evader's  good  strategy  was  derived  by  considering  the  modified 
game  F°  in  which  the  evader  was  required  to  inform  the  searcher  of  the  value  of  the  state  varia¬ 
ble  that  applied  before  each  look.  We  can  do  the  same  thing  here.  Now,  however,  a  cost 
C(P  P')  is  associated  with  any  transformation  of  the  state  variable  and  it  will  prove  convenient 
to  split  the  game  into  two  parts.  This  will  allow  us  to  develop  two  payoff  functions,  one  that  ap¬ 
plies  before  the  evader  exercises  his  move  strategy  and  another  that  applies  immediately 
thereafter. 

These  payoff  functions  can  be  developed  by  defining  the  modified  games  F  and  F'.  In  both 
games,  the  evader  is  required  to  reveal  the  value  of  P  that  applies  before  each  look.  Game  F 
applies  prior  to  the  time  when  the  evader  exercises  his  moving  strategy.  Game  F'  applies  after 
this  has  occurred.  Therefore,  we  can  consider  F'  as  the  game  in  which  the  evader  is  not  allowed 
to  move  until  after  the  next  look  has  been  made.  Clearly,  F  and  F'  are  two  parts  of  a  single 
sequential  game.  Their  relation  to  each  other  is  shown  in  the  following  diagram. 
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|3-22-4l49| 


MOVE  STRATEGY  DETECTION 


UNSUCCESSEUL  LOOK 


In  this  diagram,  the  transition  from  F  to  F'  that  occurs  when  the  evader  exercises  a  move 
strategy  is  represented  by  a  broken  line,  whereas  the  transition  that  occurs  when  the  searcher 
makes  an  unsuccessful  look  is  represented  by  a  solid  line.  This  convention  will  be  used  in  all 
future  chain  diagrams,  etc.  Usually,  however,  any  transition  representing  detection  will  be  left 
out.  Then,  as  in  the  chain  diagrams  used  in  Chapter  2,  all  look  transitions  will  represent  un¬ 
successful  looks. 

Payoff  functions  can  be  associated  with  both  F  and  F'  in  the  same  manner  as  before.  We 
can  let  U(P)  represent  the  future  payoff  in  game  F  when  the  evader  is  in  box  1  with  probability 
P  and  both  players  use  optimum  strategies  in  the  future.  Similarly,  we  can  define  U'(P)  as  the 
corresponding  payoff  in  game  F'.  Note  that  P  is  used  to  represent  the  state  variable  in  either 
game. 

The  functional  equations  that  express  these  payoffs  in  terms  of  each  other  are  easily  derived. 
Given  game  F',  the  searcher  must  decide  which  box  should  be  examined,  and  his  decision  can  be 
based  on  the  present  value  of  P.  The  situation  is  exactly  the  same  as  in  game  f“  except  that  if 
the  look  is  unsuccessful,  game  F  is  played  next.  Therefore, 


U'(P)  =  min 


U'(P:l)  =  l+(Pr,  +  1-P)U  [p^^7l-p| 

UMP;  2)  =  1  +  [P  +  (1  -P)  U  [p-  TT^rr-] 


(4-3) 


As  before,  U'(P;  i)  represents  the  payoff  that  results  if  box  i  is  examined  and  both  players  use 
optimum  strategies  thereafter. 

In  game  F,  the  evader  has  the  opportunity  to  transform  P  into  some  other  P'.  He  must 
weigh  the  cost  of  such  a  transformation  against  the  future  payoff  U'(P')  in  the  subsequent  game 
F'.  Clearly, 

U(P)  =  max  {-(JL I  P' -  P|  +  U'(P')}  .  (4-4) 

P' 


These  two  functional  equations  are  necessary  and  sufficient.  That  is,  a  unique  pair  of  func¬ 
tions  U(P)  and  U'(P)  exists  that  satisfies  the  above  equations.  Once  they  are  known,  the  optimum 
strategies  for  the  two  players  can  be  found  easily.  This  fundamental  property,  plus  others  of 
interest,  is  developed  in  Appendix  B.  Since  most  of  the  properties  are  fairly  clear  once  they  are 
stated,  the  proofs  are  not  included  here. 

It  is  interesting  to  note  in  passing,  however,  that  the  proofs  are  accomplished  by  the  use  of 
the  truncated  games  F^  and  F^.  Given  F^,  the  evader  exercises  a  move  strategy  and  F^  is 
played.  If  the  next  look  is  unsuccessful,  F^  ^  follows  and  this  process  continues  down  to  Fq. 

If  the  evader  survives  until  this  point,  the  game  stops  and  he  collects  an  additional  reward  that 
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is  independent  of  P.  It  is  shown  that  as  n  approaches  infinity,  Ujj(P)  and  U^(P)  approach  limit¬ 
ing  forms  U(P)  and  U'(P),  respectively.  Furthermore,  it  is  shown  that  as  n  approaches  infinity, 
the  probability  that  the  game  will  last  until  approaches  zero.  It  follows  that  these  limiting 
functions  must  satisfy  Eqs.  (4-3)  and  (4-4).  Furthermore,  most  properties  that  hold  for  all  F 
or  all  F^  must  apply  equally  well  in  F  or  F',  respectively. 

An  important  property  developed  in  this  manner  is  that  both  U(P)  and  U'(P)  are  continuous 
and  convex.  We  shall  see  in  Sec.  4.6  that  they  are  also  piecewise  linear  over  the  entire  interval 
(0,  1)  if  the  moving  cost  is  not  prohibitive.  The  quantity  U'(P)  is  represented  by  a  function  of 
this  form  in  Fig.  4. 


Fig.  4.  A  typical  payoff  function. 


With  Fig.  4  in  mind,  we  are  in  a  position  to  find  the  simple  manner  in  which  U(P)  is  related 
to  U'(P)  and  the  general  form  of  the  evader's  optimum  strategy.  The  points  P  and  included 
in  this  figure  are  defined  as  follows: 


dU'(P) 

dP 

P  ^  P_ 

P  >  P_  ; 

dU'(P) 

dP 

P  <  P^ 

P  >P^  . 

(4-5) 

Equation  (4-4)  states  that 

U(P)  =  max  {-p  I  P  -  P' i  +  U'(P')} 

P' 

A  little  thought  will  show  that  if  P  P  <  P^,  then  U(P)  =  U'(P).  On  the  other  hand,  if  P  <  P  , 
it  follows  that  U(P)  =  -p(P_  -  P)  +  U'(P_),  and  if  P  >  P^,  then  U(P)  =  -p(P  -  P^)  +  U'(P^).  If 
P  lies  in  the  interval  (P  ,  P^),  the  evader  should  not  move  before  the  next  look,  and  we  shall 
call  this  interval  the  no-move  region.  If  P  is  less  than  P_,  the  evader  should  transform  P  to 
P_  by  moving  to  box  1  if  he  is  in  box  Z  with  probability  x^  =  (P  —  P)/(l  —  P).  If  P  is  greater 
than  P^,  he  should  transform  P  into  P^  by  moving  to  box  2  if  he  is  in  box  1  with  probability 
x^  =  (P  —  P^)/P.  The  intervals  (0,  P  )  and  (P^,  1)  will  be  called  moving  regions. 

In  the  next  section  we  shall  see  that  the  magnitude  of  the  slope  of  U'(P)  must  be  greater  than 
at  P  =  0  if  ^  1  and  at  P  =  1  if  Therefore,  except  in  these  unusual  cases,  P  and 

P^  belong  to  the  interior  of  the  interval  (0,  1).  These  points  are  defined  with  care  because  U'(P) 


p 
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is  usually  piecewise  linear,  and  it  is  possible  for  one  of  these  segments  to  have  a  slope  whose 
magnitude  is  exactly  equal  to  p.. 

The  function  U(P)  can  be  constructed  from  U'(P)  by  replacing  this  function  in  the  moving 
regions  (0,  P  )  and  (P^,  1)  by  tangent  segments  with  slopes  p  and  —  p,  respectively.  This  is 
shown  in  Fig.  5.  Clearly,  U(P)  achieves  its  maximum  inside  the  no-move  region. 


Fig.  5.  The  relationship  between  U(P)  and  U'(P). 


p 


The  final  property  that  is  developed  in  Appendix  B  by  means  of  the  truncated  games  con¬ 
cerns  the  searcher's  optimum  strategy  in  F',  namely; 

There  exists  a  Pq,  where  P  '  P^  <  P^,  such  that 

P  <  Pq - U'(P;  2)  <  U'(P;  1)  =>  look  into  box  2 

P  >  Pq  U'(P;  1)  <  U'(P;  2)  look  into  box  1 

Thus,  the  searcher's  optimum  strategy  is  quite  similar  to  that  in  F"*  .  Here,  however,  P^  will 
in  general  be  a  function  of  the  moving  cost  in  addition  to  the  detection  probabilities.  In  F'*’.  Pq 
was  simply  equal  to  +  q^). 

4.4  PROHIBITIVE  MOVING  COST 

It  has  been  mentioned  that  a  finite  bound  p^  usually  exists  above  which  the  moving  cost  is 
prohibitive.  That  is,  for  any  p  greater  than  Pp,  game  G  behaves  in  a  manner  essentially  iden¬ 
tical  to  that  of  G°° .  In  particular,  the  value  and  the  searcher's  good  strategy  are  identical  to 
those  in  G°°;  the  evader's  good  strategy'  requires  him  to  hide  initially  in  box  1  with  the  same 
probability  P=-  as  in  g”,  and  finally  the  evader  should  never  move  as  long  as  the  searcher  uses 
his  good  strategy. 

In  this  section,  the  conditions  that  insure  this  behavior  will  be  considered,  and  we  shall  find 
that  p  may  be  obtained  from  the  payoff  function  U  (P).  Furthermore,  we  shall  find  that  the 
evader's  complete  good  strategy,  including  the  rule  for  moving  when  the  searcher  does  not  use 
a  good  strategy,  can  be  easily  obtained  once  U^CP)  is  known.  It  follows  that  one  can  determine 
whether  p  is  prohibitive  and  compute  the  good  strategies  from  the  solution  of  the  no-move  game. 

Let  us  first  consider  more  closely  the  conditions  under  which  the  evader  should  move.  In 
the  last  section  we  found  that  the  evader  should  exercise  a  moving  strategy  if  P  does  not  belong 
to  the  no-move  region  (P  ,  P^.).  Such  a  strategy  transforms  P  to  the  nearest  boundary  of 
(P  ,  P^),  and  since  the  game  should  start  at  P*,  which  lies  inside  this  region,  moving  is  required 
only  when  the  state  variable  is  transformed  out  of  the  no-move  region  by  an  unsuccessful  look. 
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If  P  cannot  be  removed  from  the  no-move  region  by  an  optimum  look,  the  evader  will  never 
need  to  move  when  the  searcher  uses  an  optimum  strategy. 

In  order  to  determine  when  this  occurs,  let  us  define  the  recurrent  region  (Pq^,  ^02^ 
same  manner  as  in  P*.  That  is,  let 


01 


02 


^0  + 


(1 


(4-6) 


Since  the  searcher  should  look  into  box  1  if  P  is  greater  than  Pq  and  into  box  2  if  P  is  less  than 
Pq,  P  can  never  be  removed  from  (Pqi’  ^02^  optimum  look.  Also,  P  can  never  be  re¬ 

moved  from  this  region  by  the  evader  if  he  uses  his  optimum  strategy,  because  such  an  optimum 
strategy  can  only  shift  P  in  the  direction  of  P^  but  not  beyond  it.  Therefore,  for  any  p,  the  re¬ 
current  region  has  the  same  property  that  it  had  in  f”. 

The  condition  under  which  P  cannot  be  removed  from  the  no-move  region  by  an  optimiun 
look  should  now  be  clear.  The  no-move  region  (P  ,  P^)  must  contain  the  recurrent  region 
(Pq^,  Pq2)-  That  is,  we  require  that  P  <  ^01  ^02  ^  ^+‘  ^  ^  belongs  to  (Pq^,  ^02^’  will 

remain  there  and  hence  inside  (P  P^)-  On  the  other  hand,  if  P  belongs  to  the  no-move  region 
but  not  to  the  recurrent  region,  it  will  soon  be  transformed  into  the  recurrent  region  by  means 
of  a  sequence  of  optimum  looks.  During  this  process,  P  moves  towards  (Pq^.  Pqz^  there¬ 
fore  cannot  leave  the  no-move  region. 

When  this  condition  holds,  U(P)  must  be  equal  to  U'(P)  throughout  the  recurrent  as  well  as 
the  no-move  region.  We  may  therefore  derive  Pq,  the  unique  point  at  which  a  look  into  either 
box  is  optimum,  in  the  same  manner  as  it  was  derived  in  Chapter  2.  Since 

UMPq:  1)  =  U(Pq;  1)  , 

U'(Po;12)  =  U(Po;12)  ,  etc.. 


the  derivations  are  identical  and 


P  =  ^ 

0  <11+^2 

It  follows  that  Pq^  and  Pg^  have  the  same  values  as  in  f”.  Furthermore,  both  U(P)  and  U'(P) 
are  identical  to  U“(P)  throughout  the  no-move  region.  In  Sec.  4.3,  the  no-move  region  was  de¬ 
fined  as  the  interval  over  which  |  [dU'(P)  ]/dP  |  is  less  than  p.  Therefore,  the  no-move  region 
contains  the  recurrent  region  and  p  is  prohibitive  if  and  only  if  |  [dU'(P)]/dP|  is  less  than  p  for 
all  P  belonging  to  (Pg^^.  Pga^'  Since  U°°(P)  is  convex,  we  may  define  pp  by 


p  =  max 


dU  (P> 
dP 


dU  (P) 


■HF" 


*’02' 


(4-7) 
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The  quantity  will  be  finite  as  long  as  the  magnitude  of  the  slope  of  u”(P)  is  finite  over  the 

recurrent  region.  This  will  be  true  if  both  and  q^  are  unequal  to  one,  since  both  and  P^^ 

belong  to  the  interior  of  (0,  1)  under  this  condition.  When  both  detection  probabilities  equal  one, 
Pq^  =  0  and  Pq^  =  !•  It  this  case,  however,  U°°(P)  consists  of  only  two  linear  segments,  both 
of  finite  slope,  and  =  1.  The  only  case  where  is  infinite  occurs  when  one,  but  not  both,  of 
the  detection  probabilities  is  equal  to  unity.  For  example,  when  q^  =  1  and  q^  ^  1,  P^^  =  1  and 
the  magnitude  of  the  slopie  of  u“(P)  approaches  infinity  as  P  approaches  one. 

A  heuristic  argument  supports  the  fact  that  p^  is  infinite  when  just  one  detection  probability 
is  equal  to  one.  If  the  searcher  assumed  that  the  evader  could  not  move  when  p  was  finite,  he 
would  look  only  once  into  box  2  (q^  =  1).  After  this  look,  the  evader  would  be  willing  to  pay  any 
finite  price  to  move  to  box  2  if  he  were  in  box  1,  since  he  could  then  survive  for  all  time  and 
collect  an  infinite  payoff. 

Since  U(P)  is  equal  to  U'”(P)  over  the  entire  no-move  region  when  p  >  p^,  the  evader's  com¬ 
plete  good  strategy  can  be  easily  derived.  The  quantity  U(P)  is  a  maximum  inside  (P  ,  P^)  and 
the  state  variable  P*,  which  should  be  chosen  at  the  beginning  of  the  game,  and  the  maximum 
guaranteed  payoff  U(P*)  are  the  same  as  in  F°°.  Since  P  can  still  be  transformed  outside  of  the 
no-move  region  if  th£«'searcher  does  not  use  a  good  strategy,  one  must  calculate  the  values  of 
the  points  P  and  P^.  This  can  be  done  by  finding  thf  points  at  which  the  magnitude  of  the  slope 
of  U°°(P)  first  exceeds  p.  Note  that  both  U(P)  and  U'(P)  are  unequal  to  U  (P)  over  the  moving 
regions. 

The  searcher's  good  strategy  in  G  is  identical  to  that  in  g”  as  long  as  p  is  greater  than 

p  .  As  we  saw  in  Chapter  2,  the  good  strategy  consists  of  a  random  selection  of  the  two  infinite 
P  00 

sequences  that  are  optimum  in  F  at  P*.  Each  of  these  sequences  has  an  associated  payoff 
function  u”(P)  =  a^P  +  bj^(l  —  P).  After  each  look,  the  future  sequence  is  the  same  as  that  asso¬ 
ciated  with  another  linear  segment  of  U  (P)  in  (Pq^.  ^02^  represented  in  the  same 

manner.  For  all  of  these  sequences. 


-bil 


dU“{P) 

dP 


must  be  less  than  p.  Therefore,  even  if  the  evader  knows  the  future  sequence,  he  will  find  it 
unwise  to  move.  The  initial  random  selection  of  one  of  the  two  optimum  sequences  associated 
with  P*  is  such  that  it  limits  the  evader  to  a  payoff  independent  of  P  and  equal  to  U”(P’'')  = 
U(P*)  =  v”.  When  the  searcher  uses  this  good  strategy  the  evader  will  receive  this  payoff  as 
long  as  he  does  not  move. 


4.5  BEHAVIOR  OF  THE  PAYOFF  FUNCTIONS  WHEN  p  IS  LESS  THAN  u 

f-p 

In  Sec.  4.3,  some  general  properties  of  the  payoff  functions  were  discussed,  and  in  the  last 
section  we  saw  that  both  of  them  were  identical  to  u”(P)  in  the  no-move  region  when  the  moving 
cost  was  prohibitive.  In  this  section  we  shall  consider  the  case  where  the  moving  cost  is  not 
prohibitive  and  examine  the  properties  of  the  payoff  functions  more  closely.  Particular  emphasis 
will  be  placed  on  the  manner  in  which  these  functions  change  as  p  increases  from  zero  to  p^. 

Before  considering  the  case  where  0  <  p  <  Pp,  however,  it  is  worthwhile  to  consider  the  ap¬ 
pearance  of  the  payoff  functions  when  p  is  equal  to  ze-o.  In  Chapter  3  we  found  that  the  evader 
should  always  return  the  state  variable  to  Pq  =  +  q^)  before  each  look  and  that  he  could 
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guarantee  a  payoff  equal  to  (l/q^)  +  (l/q2)-  K  follows  that  P_  and  are  both  equal  to  Pq  and 
that  the  no-move  region  is  simply  a  point.  Since  p  =  0,  the  quantity  U(P)  is  simply  equal  to 
U'(Pq)  for  all  P  and  is  thus  a  constant.  In  game  F',  however,  the  evader  is  not  allowed  to  move 
before  the  next  look.  Therefore,  the  payoff  U'(P)  will  be  a  function  of  P.  Using  Eq.  (4-3)  and 
noting  that  U(P)  =  (l/q^)  +  <1/^2*  = 


U'(P) 


P(1  +  r^V°)  +  (1  -  P)  (1  +  V°) 
P(1  +  V“)  +  (1  -  P)  (1  +  r^V") 


P>Po 

P<Po 


Both  U(P)  and  U'(P)  are  shown  in  Fig.  6,  where  the  convention  of  representing  U(P)  by  a  broken 
line  outside  the  no-move  region  is  used. 

If  p  is  very  small  but  unequal  to  zero,  the  appearance  of  these  functions  can  be  only  slightly 
different.  Since  p  is  unequal  to  zero,  U(P)  can  no  longer  be  flat  but  must  decrease  at  a  rate 
equal  to  p  as  it  extends  from  each  side  of  the  no-move  region.  For  any  fixed  P,  U(P)  and 
U'(P)  must  be  continuous  functions  of  p.  They  must  be  identical  over  the  no-move  region,  and 
U'(P)  must  be  linear  over  an  interval  that  is  transformed  into  one  of  the  moving  regions  by  an 
optimum  look.  It  follows  that  the  no-move  region  must  still  consist  of  the  point  P^  (which  can 
change  with  p)  when  p  is  slightly  greater  than  zero. 


Fig.  6.  Payoffs  when  p  =  0. 


Fig.  7.  Payoffs  when  p  is  slightly  greater  than  zero. 


A  pair  of  functions  whose  appearance  satisfies  the  above  properties  is  shown  in  Fig.  7.  Here, 

the  magnitude  of  the  slope  of  U(P)  is  equal  to  p  on  either  side  of  Pp.  Now  U'(P)  consists  of  four 

linear  segments.  As  before,  a  breakpoint  occurs  at  Pq,  since  the  associated  optimum  look  is 

different  on  either  side  of  Pg.  The  point  Pj  is  transformed  into  Pq  by  an  optimum  look  into  box  2. 

It  is  a  breakpoint  of  U'(P)  because  the  segments  of  U'(P)  to  the  left  and  right  of  P.  transform  into 

the  segments  of  U(P)  that  are  to  the  left  and  right  of  Pq,  respectively.  A  similar  effect  occurs 

at  P.  where  P. — P„. 

3  3  0 

If  we  continue  to  increase  p  from  zero  on  up,  these  functions  will  keep  on  changing  in  a  con¬ 
tinuous  manner.  The  point  P^  may  move,  but  Pj  and  P^  must  be  related  to  Pp  in  the  above  manner. 
The  two  linear  segments  of  U(P)  will  continue  to  increase  in  steepness  with  p,  and  so  forth.  Both 
functions  must  retain  the  same  general  appearance  until  at  some  p^  either  the  segment  of  U(P)  in 
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(Pq,  1)  becomes  tangent  to  that  of  U'(P)  in  (Pq,  Pj),  or  the  segment  of  U(P)  in  (0,  P^)  becomes 
tangent  to  that  of  U'(P)  in  (P.,  Pq). 

Because  the  general  appearance  of  the  payoff  functions  and  hence  the  general  behavior  of  the 
associated  optimum  strategies  for  the  two  players  are  the  same  over  the  interval  (0,  p.^),  this  in¬ 
terval  is  called  a  strategy  interval.  When  p  increases  beyond  p^,  the  payoff  functions  take  on 
new  forms  associated  with  the  next  strategy  interval  (p^,  shall  see  that  there  is  a  se¬ 

quence  of  strategy  intervals  (0,  p^),  (p^,  {‘■2)"  •  •  •  >  Pp)  over  (0,  Pp).  The  appearance  of  the 
payoff  functions  and  the  general  behavior  of  the  optimum  strategies  are  the  same  over  each  in¬ 
terval  but  change  from  interval  to  interval.  We  shall  see  that  as  p  goes  from  interval  to  inter¬ 
val,  these  characteristics  approach  those  associated  with  F°°. 

In  order  to  extend  this  discussion  in  a  more  precise  way,  it  is  necessary  to  develop  the  prop¬ 
erties  of  the  payoff  functions,  which  have  already  been  discussed,  more  fully.  Recall  that  both 
U(P)  and  U'(P)  are  continuous  and  convex,  that  U(P)  is  identical  to  U'(P)  over  the  no-move  region, 
defined  as  the  interval  in  which 

|dU'(P)| 

I  dP  I  ^ 

and  that  the  searcher's  optimum  strategy  requires  a  look  into  box  2  if  P  is  less  than  some 
unique  Pq  and  a  look  into  box  1  if  P  is  greater  than  Pq. 

Let  us  consider  the  linear  relationships  that  exist  between  U(P)  and  U'(P).  The  function 
U'(P)  is  equal  to  U'(P;  t )  if  P  >  Pq,  and  is  equal  to  U'(P;  2)  if  P  <  Pq.  If  U(P)  is  linear  over  some 
interval  tTj^,  then  U'(P;  i)  must  be  linear  over  the  interval  that  is  transformed  into  tTj^  by  an  un¬ 
successful  look  into  box  i.  Therefore,  if  U(P)  is  piecewise  linear,  U'(P)  must  be  also.  U(P)  is 
identical  to  U'(P)  over  the  no-move  region  and  is  linear  over  each  of  the  moving  regions.  Hence, 
the  reverse  is  also  true.  We  shall  assume  that  both  functions  are  indeed  piecewise  linear.  That 
is,  we  shall  assume  that  each  function  is  partitioned  into  a  set  of  linear  segments  by  a  set  of 
breakpoints.  As  long  as  we  can  show  that  the  set  of  breakpoints  associated  with  each  payoff  func¬ 
tion  is  finite,  this  assumption  must  be  correct.  We  shall  soon  see  that  when  r  U(P) 

and  U'(P)  are  piecewise  linear  over  the  entire  interval  (0,  1)  as  long  as  p.  is  finite.  Also,  we 
shall  see  that  both  functions  are  piecewise  linear  over  the  interval  (0,  1)  in  general  if  p  is  strictly 

less  than  p  . 

P 

Let  us  consider  the  manner  in  which  the  breakpoints  of  the  two  functions  are  related  to  each 
other: 


U'(P)  =  min 


U'(P:  1)  =  1  +  [Pr^  +  1  -  P]  U  [p^r-^T^] 
U'(P;  2)  =  1  +  [P  +  (1  -  P)  r^]  U 


The  point  Pq  must  be  a  breakpoint  of  U'(P)  because  it  is  the  unique  point  at  which  U'(P;  1)  and 
U'(P;  2)  are  equal.  To  either  side  of  any  other  point,  the  same  next  look  is  optimum.  Therefore, 
such  a  point  can  be  a  breakpoint  of  U'(P)  if  and  only  if  it  is  transformed  by  the  next  optimum  look 
into  a  breakpoint  of  U(P),  U(P)  is  linear  over  each  of  the  moving  regions.  Therefore,  all  its 
breakpoints  must  belong  to  the  no-move  region  where  U{P)  =  U'(P).  It  follows  that  any  breakpoint 
of  U'(P)  other  than  Pq  must  be  transformed  into  some  other  breakpoint  of  U'(P)  that  belongs  to 
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the  no-move  region.  Such  a  point  must  eventually  be  transformed  into  Pq  and  it  cannot  be  trans¬ 
formed  into  a  moving  region  before  this  occurs.  Therefore,  the  breakpoints  other  than  Pq  that 
are  common  to  U(P)  and  U'(P)  are  those  points  belonging  to  the  no-move  region  that  are  trans¬ 
formed  by  an  optimum  search  sequence  into  Pq  before  leaving  this  region.  The  remaining  break¬ 
points  of  U'(P)  are  those  points  transformed  into  a  breakpoint  of  the  no-move  region  by  the  next 
optimum  look.  n  n 

In  f”,  we  found  that  the  general  behavior  of  the  optimum  search  sequence  when  r^  =  r^ 
could  be  found  by  ordering  the  breakpoints  in  the  recurrent  region  as  follows: 


We  saw  that  any  interval  to  the  right  of  Pq  was  transformed  n^  places  to  the  left  by  an  optimum 
look  into  box  1  and  that  any  interval  to  the  left  of  Pq  was  transformed  n^  places  to  the  right  by 
an  optimum  look  into  box  2.  A  chain  diagram  could  be  drawn  which  would  show  the  manner  in 
which  the  linear  intervals  transformed  into  each  other.  If  this  was  done,  it  was  a  fairly  straight¬ 
forward  task  to  calculate  the  linear  payoffs  associated  with  each  interval  and  the  values  of  the 
separating  breakpoints. 

In  games  F  and  F'  a  similar  technique  can  be  used.  As  we  saw  in  the  last  section,  the  re¬ 
current  region  (Pq^,  Pq^)  has  the  same  properties  that  it  has  in  f".  Although  we  may  no  longer 
equate  Pq  to  +  q^)^  we  can  still  order  the  points  •  •  • ,  Pq'  ’  '  '  ’  before.  These 

are  the  points  belonging  to  the  recurrent  region  that  would  be  transformed  into  Pq  by  an  optimum 
search  sequence  if  no  moving  were  to  occur.  Therefore,  these  are  the  only  points  belonging  to 
the  recurrent  region  that  can  be  breakpoints  of  U(P)  or  U'(P)-  If  I'-  is  less  than  Pp,  at  least  one 
of  the  moving  regions  must  extend  into  the  recurrent  region.  In  this  case,  some  of  these  points 
cannot  be  breakpoints  of  U(P)  and  usually  some  of  them  won't  be  breakpoints  of  U'(P)  either. 

As  an  example,  consider  again  the  case  where  r^  =  r^  snd  suppose  that  the  no-move  region 
is  (P  Pj).  It  is  a  simple  matter  to  find  where  the  breakpoints  occur.  They  are  shown  in  Fig.  8 


Fig.  8.  The  form  of  a  possible  pair 
of  poyoff  functions. 
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Breakpoints  of  both  U(P)  and  U'(P)  occur  at  P_j.  Pq.  P2  and  P^.  In  addition,  P  which  belongs 
to  the  recurrent  but  not  the  no-move  region,  is  a  breakpoint  of  U'(P).  Breakpoints  of  U'(P)  that 
lie  outside  the  recurrent  region  are  not  shown. 

This  set  of  breakpoints  is  consistent.  Each  breakpoint  of  U'(P)  to  the  left  of  P^  transforms 
into  a  breakpoint  in  U(P)  four  (n^)  places  to  the  right  and  so  forth.  Therefore,  it  is  possible 
that  the  payoff  functions  may  take  on  this  form  over  some  range  in  p.  A  little  thought  will  show, 
on  the  other  hand,  that  if  (P_2'  ^2^  were  guessed  for  the  no-move  region,  inconsistencies  would 
develop. 

4.6  EXAMPLE:  =  r| 

It  is  appropriate  at  this  point  to  return  to  our  study  of  the  manner  in  which  the  payoff  func¬ 
tions  behave  as  p  goes  from  zero  to  p  .  In  particular,  let  us  again  consider  the  example  where 
4  3  ^ 

r^  ~  9  shows  the  various  forms  that  these  functions  assume  in  the  recurrent  region 

(P  2,  P,j)-  The  boundary  points  of  the  no-move  region  are  denoted  by  circles.  The  linear  inter¬ 
vals  are  numbered  in  order,  it .  .  . ,  and  tt  jt  ,  starting  from  P^  and  working  toward 

P  and  P^.  Note  that  there  is  no  longer  any  general  correspondence  between  the  subscript  of  an 

interval  tt.  and  its  bounding  breakpoints  P.  and  P.  .  The  linear  intervals  of  U(P)  in  the  moving 
^  3  ^ 

regions  are  designated  by  tt  and  tt^,  and  U  (P)  and  U^(P)  have  slopes  equal  to  p  and  —  p,  respec¬ 
tively.  The  linear  intervals  of  U'(P)  that  immediately  adjoin  (P  ,  P^)  are  designated  by  tt'  and  tt^. 


A  3 

Fig.  9(a-f).  The  forms  of  the  payoff  functions  for  o  set  of  strategy  intervals  (r^  =  r^ )• 
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In  Fig.  9(a),  the  general  form  of  the  payoff  functions  in  the  recurrent  region  is  shown  for  p. 
close  to  zero.  This  result  agrees  with  the  discussion  at  the  beginning  of  Sec.  4.5.  As  p  in¬ 
creases,  the  segments  in  ir  and  tt ^  increase  in  steepness.  At  some  point,  either  P  will  shift 
to  P  4  or  P^  will  shift  to  P^.  In  general,  there  is  no  simple  way  to  predict  which  shift  will  oc¬ 
cur.  In  the  next  section,  where  the  actual  computation  of  the  payoff  functions  is  considered,  a 
method  for  computing  which  shift  occurs  first  will  be  discussed.  In  this  example,  U^(P)  becomes 
tangent  to  U^(P)  and  P^  shifts  to  P^.  The  second  strategy  interval  yields  functions  of  the  form 
shown  in  Fig.  9(b).  Here  a  new  breakpoint  appears  in  U'(P)  at  P_^  as  a  result  of  the  breakpoint 
introduced  at  the  new  position  of  P^. 

As  the  moving  cost  increases  further,  we  find  the  behavior  exhibited  in  Fig.  9(c-e).  In  Fig.  9(e), 
P  has  shifted  to  the  edge  of  the  recurrent  region.  When  U^(P)  becomes  tangent  to  U]|^(P),  p  is 
equal  to  Pp.  Figure  9(f)  shows  the  general  form  when  the  moving  cost  is  prohibitive.  Here,  of 
course,  Pq  = 

If  both  and  are  unequal  to  one,  the  magnitude  of  the  slope  of  U  (P)  becomes  arbitrarily 

large  as  P  approaches  zero  and  one.  Therefore,  as  long  as  p  is  finite,  P  must  be  greater 

than  zero  and  P  must  be  less  than  one.  It  follows  that  U(P)  and  U'(P)  can  have  only  a  finite 

n,  n, 

number  of  breakpoints  in  the  no-move  region  when  r.  ^  =  r,  Furthermore  U'(P)  can  have  only 

^  ^  n,  n^ 

a  finite  number  of  breakpoints  in  either  moving  region.  Therefore,  when  r^  ^  =  r^  then  U(P) 
and  U'(P)  will  be  piecewise  linear  over  the  interval  (0,  1)  if  p  is  finite. 

Chain  diagrams  that  illustrate  the  behavior  of  the  optimum  strategies  can  be  drawn  in  a  man¬ 
ner  quite  similar  to  that  used  in  Chapter  2.  Those  associated  with  the  various  strategy  intervals 
of  our  example  are  shown  in  Fig.  10.  In  these  chain  diagrams,  each  state  Sj  is  associated  with 
the  interval  ir^  and  has  an  associated  payoff  function.  A  transition  from  one  state  to  another  pro¬ 
duced  by  an  optimum  look  is  represented  by  a  solid  line.  The  transitions  from  state  s_  to  s|_  and 
from  s^  to  s^  occur  when  the  evader  moves  (with  the  proper  probability)  and  are  represented  by 
broken  lines.  In  general,  each  linear  interval  that  belongs  to  both  the  no-move  and  the  recurrent 
regions  will  be  represented  by  a  single  state  Sj  in  the  chain  diagram  since  U.(P)  and  U!(P)  are 
identical.  In  addition  to  these  states,  s  and  s'  will  be  included  in  the  chain  if  P  belongs  to  the 
interior  of  the  recurrent  region  and  and  s^  will  be  included  if  P^  does.  Here  we  must  differ¬ 
entiate  between  the  states  associated  with  U(P)  and  U'(P)  since  these  payoffs  are  different. 

In  the  case  where  9^  ~  ^  ^  behavior  of  the  payoff  functions  is  quite  similar  to 

that  found  in  the  above  example.  In  this  case,  the  breakpoints  occur  at  Pp,  P^,  ....  Pj^,  ....  P^, 
where  Pj^  is  transformed  into  Pq  by  k  looks  into  box  1.  There  is  one  linear  segment  over 
(0,  Pq).  In  the  first  strategy  interval,  P  =  P^  =  P^.  As  p  increases,  P^  will  shift  from  Pq  to 
P^  to  P^  and  so  forth.  At  some  point,  P  must  shift  from  Pq  to  zero,  and  as  p  increases  fur¬ 
ther,  P^  will  continue  to  shift  to  the  right,  point  by  point.  Since  Pj^  approaches  Pq^  =  ^  only  as 

k  approaches  infinity,  this  process  will  continue  indefinitely.  CXrer  any  strategy  interval,  the 

4  3 

chain  diagram  can  be  drawn  in  the  same  manner  as  in  the  example  where  r^  “  ^^2  • 

In  Chapter  2,  we  found  that  a  chain  diagram  could  not  be  associated  with  the  linear  segments 
of  u”(P)  in  the  recurrent  region  when  logr^/logr^  was  irrational  because  there  were  an  infinite 
number  of  segments.  That  is,  there  were  an  infinite  number  of  points  in  (Pq^,  ^02^  that  were 
eventually  transformed  into  Pq  by  the  optimum  search  sequence.  In  games  F  and  F'  this  no 
longer  occurs  as  long  as  p  is  strictly  less  than  p^.  In  this  event,  at  least  one  of  the  moving 
regions  must  extend  into  the  recurrent  region,  and  only  a  finite  number  of  points  can  be  trans¬ 
formed  into  Pq  by  an  optimum  sequence  without  leaving  the  no-move  region.  As  a  result,  these 
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points  partition  the  payoff  functions  U(P)  and  U'(P)  into  a  finite  number  of  linear  segments  and 
both  functions  are  piecewise  linear.  A  finite  chain  diagram  that  illustrates  the  manner  in  which 
these  segments  transform  into  each  other  can  be  drawn. 

A  consequence  of  this  result  is  that  an  n^/n^  approximation  of  logr^/logr^  may  be  used  to 
obtain  an  exact  solution  of  the  payoff  function  when  fx  <  Pp.  The  reason  that  an  exact  solution  can 
be  obtained  by  assuming  a  good  choice  of  n^  and  n^  follows  from  the  fact  that  the  actual  computa¬ 
tions  of  the  payoff  functions  depend  only  on  the  chain  diagram  used.  If  the  correct  chain  diagram 
is  found,  the  resulting  solution  will  be  correct.  The  n^/n^  approximation  can  be  used  as  a  de¬ 
vice  for  generating  a  sequence  of  chain  diagrams.  As  long  as  the  approximation  is  sufficiently 
accurate,  it  will  produce  a  sequence  of  chains  as  p  increases  that  will  agree  with  the  sequence 
associated  with  the  irrational  case  up  to  the  correct  one.  The  former  sequence  will  merely  be 
finite  whereas  the  latter  is  infinite.  Clearly,  the  approximation  must  be  increasingly  more  ac¬ 
curate  and  the  resulting  chain  diagram  will  become  increasingly  large  as  p  approaches 

As  an  example,  let  us  approximate  “  3/2.  In  this  case,  the  3/2  approx¬ 

imation  will  yield  a  sequence  of  three  chain  diagrams  identical  to  the  first  three  in  Fig.  10.  If 
p  is  sufficiently  small,  one  of  these  three  will  be  the  correct  one  and  the  correct  solution  can  be 
obtained. 

4.7  COMPUTATION  OF  THE  PAYOFF  FUNCTIONS 

The  computation  of  the  payoff  functions  is  accomplished  in  two  steps.  F’irst,  the  correct 
chain  diagram  must  be  found.  Once  this  has  been  done,  the  payoff  functions  U.(P)  and  U|(P)  as¬ 
sociated  with  each  interval  ir.  that  has  a  corresponding  state  s^  in  the  chain  can  be  calculated. 
Finally,  the  separating  breakpoints  can  be  found.  In  the  last  section  we  saw  that  the  chain  dia¬ 
grams  changed  from  one  strategy  interval  to  another  and  that  at  the  end  of  each  strategy  interval 
two  possible  changes  could  occur.  In  this  section  we  shall  see  how  the  correct  change  can  be  de¬ 
termined.  The  required  computations,  although  simpler,  are  quite  similar  to  those  used  to  com¬ 
pute  the  actual  payoff  functions  within  a  strategy  interval.  Therefore,  we  shall  consider  the 
latter  problem  first. 

4.7.1  Computation  of  the  Payoff  Functions  When  the  Correct 
Chain  Diagram  Is  Known 

The  chain  diagram  associated  with  a  given  strategy  interval  contains  all  the  information 
needed  for  computing  the  payoff  functions.  In  order  to  clarify  the  discussion,  the  chain  diagram 
in  Fig.  10(c)  will  be  used  as  an  example. 


|3-gg-  <I50| 
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This  diagram  is  typical  of  those  that  occur  when  both  P_  and  belong  to  the  interior  of  the 
recurrent  region.  Each  linear  interval  belonging  to  the  no-move  region,  where  U(P)  =  U'(P), 
has  a  corresponding  state  in  the  chain.  The  states  associated  with  jr  ,  ir' ,  and  are  also 
included,  since  these  intervals  extend  into  the  recurrent  region.  In  such  a  chain  there  is  a  sin¬ 
gle  loop.  Moving  occurs  only  in  the  transition  from  s_  to  s'  and  from  s^  to  s^.  These  two  pairs 
of  states  divide  the  loop  into  two  parts. 

The  linear  payoffs  associated  with  each  state  can  be  expressed  in  the  same  form  used  in  F°°. 
For  any  state  Sj  associated  with  an  interval  in  the  no-move  region  we  can  write 

U.(P)  =  ur(P)  =  a.P  +  b.(l  -  P)  . 

Furthermore,  we  can  let 

U_(P)  =  a_P  +  b_{l  -  P) 

U'  (P)  =  a'  P  +  b'  (1  -  P) 


and  so  forth. 

If  two  states  are  not  separated  from  each  other  by  a  move  transition,  their  payoffs  are  re¬ 
lated  to  each  other  in  the  same  manner  as  in  f”.  If  a  look  sequence  represented  by 
transforms  Sj  into  Sj  and  no  move  transitions  intervene,  we  may  use  Eq.  (2-13)  to  write 


- 

=  'll  Z  t^(n)  r^"'^  +  r^  +  a^)  , 


n=l 


l^i  =  q2  Z  *2*"*  ^2*  ^2 

n=l 


(4-8) 


As  before,  k.  represents  the  total  number  of  looks  into  box  i  during  the  transition.  Since  at 
least  one  move  transition  occurs  in  the  chain  when  p.  is  less  than  Pp,  the  payoff  associated  with 
a  given  state  can  never  be  expressed  in  terms  of  itself  and  solved  directly.  The  above  equations 
can  be  used,  however,  to  express  U'  (P)  in  terms  of  U^(P)  and  U|(P)  in  terms  of  U  (P).  This 
will  yield  four  equations  involving  the  eight  unknowns  a  ,  b  .  a'  ,  b'  ;  a^,  b^,  a^  and  b^. 

Other  properties  that  can  be  utilized  in  order  to  get  a  complete  set  of  equations  are  as  follows. 
First,  the  magnitudes  of  the  slopes  of  U  (P)  and  U^(P)  are  equal  to  p.  Therefore, 

a  —  b  =  p  , 

b^  -  a^  =  p  .  (4-9) 


Also,  U  (P)  and  U'  (P)  must  intersect  at  P  ,  and  U^(P)  and  Uy(P)  must  intersect  at  P^.  This 
yields  the  equations 

+ '^.(1  -  P.)  =  -  P_)  ' 

+  b+(l  -  P+)  =  a;P^  +  b;(l  -  P^)  ,  (4-10) 

bringing  the  total  to  eight  equations.  With  the  addition  of  the  unknown  P  and  P^,  however,  there 
are  now  ten  unknowns. 
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These  points  are  the  bounding  points  of  the  no-move  region.  They  are  related  to  Pq  by  the 
manner  in  which  they  are  transformed  into  it  by  the  optimum  search  sequence.  These  sequences 
can  be  found  by  looking  at  the  chain  diagram.  In  particular,  P_  —  P^  as  ->■  s_j,  and  P^  —  P^ 
as  s^  -*  No  moving  transitions  occur  during  the  above  processes  since  the  states  in  the 

chain  diagram  form  a  single  loop  in  which 


s 


and 


As  before. 


(k,,k  ) 

P.  — - — — 


P. 

1 


P.  = 
1 


P.r,  ‘ 
3  ^ 


P.r,  ' 
3  2 


+  (t 


P^)  r, 


(4-11) 


The  two  equations  of  this  form  introduce  the  additional  unknown  P^  and  one  more  equation 
is  necessary  to  complete  the  set.  The  final  property  which  can  be  used  is  that  a  look  into  either 
box  is  optimum  when  P  is  equal  to  Pq.  Since  P^^  belongs  to  tt  and  Pq2  belongs  to  tt^. 


,U 


U(Po)  =  1  +  [P^r^  +  1  -  P, 


-  1  +  [Pq  +  (1  Pq)  I  +  (1  _  p^) 


which  reduces  to  the  equation 


(4-12) 


This  completes  the  set  of  equations  from  which  a  solution  can  be  obtained.  It  should  be  noted 
that  the  number  of  states  in  the  chain  has  no  effect  on  the  number  of  equations  required,  and 
these  equations  apply  whenever  both  P  and  P^  belong  to  the  interior  of  the  recurrent  region. 

Most  of  the  above  equations  are  linear  and  express  only  one  unknown  in  terms  of  another.  The 
complete  set  can  be  reduced  to  a  single  cubic  equation  in  a  fairly  direct  manner.  It  is  usually 
most  convenient  to  derive  this  cubic  equation  in  terms  of  Pq.  Once  this  has  been  done,  the  other 
variables  in  the  set  of  equations  can  be  obtained  easily.  Finally,  the  payoff  functions  associated 
with  the  other  states  in  the  chain  and  the  remaining  breakpoints  can  be  calculated  by  using  the 
same  techniques  used  in  Chapter  2. 

To  illustrate  the  general  method,  let  us  write  the  equations  appropriate  to  the  chain  diagram 

212  11 

at  the  beginning  of  this  section.  We  see  that  s' - -  and  s^ - ►  s  .  From  Eq.  (4-8)  it  follows 

that 

a|^  =  qj(2)  +  r^O  +  a^) 

b^  =  q^Cl  +  Sr^)  +  r^  (3  +  b^) 

b^  =  2  +  b 
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A 

Equations  (4-9),  (4-10)  and  (4-12)  may  be  used  directly.  Finally,  since  P,  - ►  and 

21  +  u 

P_ - -  pQ,  we  may  write 

p _ _ 

Pq  +  (1  -  Pq) 

and 

^0^2 

=  P^r^  +  d-P^)  • 

These  equations  and  the  chain  diagram  are  appropriate  for  the  case  where  r.  =  0.512  and 

4  3  ^ 

r^  =  0.4096  (r^  =  r^  )  when  p  is  equal  to  1.3.  These  are  the  same  escape  probabilities  used  in 

the  example  in  Chapter  2.  The  payoff  functions  which  result  are  presented  in  Table  I  and  graphed 
in  Fig.  11.  The  quantity  U(P)  is  a  maximum  at  P^  =  0.528  and  is  equal  to  3.243.  Therefore,  the 
evader  should  initially  hide  in  box  1  with  this  probability  and  can  guarantee  a  payoff  equal  to 
3.243.  The  quantities  P  and  are  equal  to  0.482  and  0.694,  respectively,  and  define  the  no¬ 
move  region.  With  these,  the  evader's  moving  strategy  is  easily  calculated.  If  P  is  less  than 
P  .  he  should  move  to  box  1  if  in  box  2  with  probability  x^  =  (0.482  —  P)/(l  —  P).  K  P  is  greater 
than  P^,  he  should  move  to  box  2  if  in  box  1  with  probability  x^  =  (P  —  0.694)/P. 

It  is  worth  noting  that  the  equations  used  to  compute  these  functions  did  not  make  use  of  the 
4  3 

fact  that  ~  '’2  '  solution  is  correct  because  the  correct  chain  diagram  was  used  and  no 

contradictions  occurred.  The  contradictions  that  would  arise  if  the  wrong  diagram  were  used 
are  quite  simple:  either  the  magnitude  of  the  slope  of  U'(P)  would  be  less  than  p  in  a  moving 
region  (in  7r|_  of  or  it  would  exceed  p  somewhere  inside  the  no-move  region.  The  slope  of 
each  linear  segment  in  is  equal  to  a.  —  b.  and  is  included  in  Table  I. 

If  only  P^  or  P  belongs  to  the  interior  of  the  recurrent  region,  the  solution  is  somewhat 
simpler.  As  an  example,  consider  the  chain  diagram  in  Fig.  10(e).  Here  only  P^  belongs  to  the 
interior  of  the  recurrent  region,  and  neither  s  nor  s'  occurs  in  the  chain.  As  a  result,  a'^  and 
b^  may  be  expressed  in  terms  of  a^  and  b^  by  means  of  Eq.  (4-8).  None  of  the  equations  that  in¬ 
volve  P_,  a_,  b_,  a^  and  b'  are  required.  On  the  other  hand,  Eq.  (4-12)  must  be  rewritten  and 
some  new  variables  must  be  introduced  into  the  set  of  equations.  Previously,  a  and  b  were 
included  in  this  equation  because  Pg^  belonged  to  the  interval  tt  .  In  this  example,  Pg^  is  the 
breakpoint  separating  Jr  and  jr  y  Since  s  ^  is  included  in  the  chain  diagram,  while  s_  is  not,  it 
is  worthwhile  to  rewrite  Eq.  (4-12)  in  the  form 

^-3'^  “  ^0*  “  '''  ~  ^0^  ^2 

We  can  express  a  ^  and  b  ^  in  terms  of  a^  and  b^,  respectively,  by  means  of  Eq.  (4-8).  The  set 
of  equations  that  results  can  be  reduced  to  a  single  quadratic  equation  in  Pg.  In  other  respects, 
the  solution  is  accomplished  in  the  same  manner  as  before. 

4.7.2  Determination  of  the  Correct  Chain  Diagram 

Unless  one  wishes  to  guess  the  form  of  the  chain  diagram  that  applies  for  a  given  pair  of 
boxes  and  a  particular  moving  cost,  one  can  examine  the  manner  in  which  the  form  of  the  payoff 
functions  changes  from  one  strategy  interval  to  another.  Two  problems  should  be  apparent. 

First,  one  must  find  which  of  the  two  possible  changes  occurs  when  p  moves  from  one  strategy 
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Fig.  11.  Payoff  functions  (rj  =0.512,  r2  =  0.4096,  p=  1.3) 


TABLE  I 

PAYOFF  FUNCTIONS 


(r,  =0.512,  r2  = 

0.4096,  |j=  1.3) 

Payoff  Function 

Range 

Slope 

(Oi-b.) 

U;,(P)  =  3.918P  +  2.498(1  -  P) 

(0.427,  0.482) 

1.420 

U_(P)  =  3.856P  +  2.556(1  -  P) 

(0,  0.482) 

1.3 

U_,(P)  =3.747P  +  2.658(1 -P) 

(0.482,  0.538) 

1.089 

U ,(P)  =  2. 974P  +  3. 556(  1  -  P) 

(0.538,  0.645) 

-0.582 

U2(P)  =  2.918P  +  3.658(1  -  P) 

(0.645,  0.694) 

-0.739 

U^(P)  =  2.747P  +  4.047(1  -  P) 

(0.694,  1.0) 

-1.3 

U;(P)  =  2. 523P  +  4. 556(1  -  P) 

_ 1 

(0.694,  0.780) 

-2.033 

Note:  P*  =  Pg=  0.538 
U(P*)  =  3.243 


il 

!| 

interval  to  another.  In  addition,  one  must  determine  the  point  at  which  the  change  occurs,  i.e., 
the  value  of  the  moving  cost  at  the  change  point. 

In  order  to  illustrate  a  method  that  can  be  used  to  answer  these  questions,  let  us  consider 
the  manner  in  which  the  third  strategy  interval  (p.^,  Pj)  of  Fig.  9  changes  into  the'fourth.  We 
know  that  the  third  strategy  interval  is  correct  over  some  range  of  p  when  r^  =  0.512  and 
r^  =  0.4096  since  it  gave  a  valid  solution  for  p  =  1.3.  To  find  which  change  occurs  at  p^,  we 
must  make  a  guess  and  find  if  it  is  correct.  For  convenience,  let  us  make  the  right  one.  That 
is,  let  us  assume  that  U'  (P)  becomes  tangent  to  U_(P)  as  p  approaches  p^. 

When  p  is  exactly  equal  to  p^,  then  U'  (P)  must  be  identical  to  U  (P).  Therefore,  in  the 
chain  diagram  in  Fig.  10(c)  we  can  delete  the  dotted  line  joining  s  and  s' ,  and  we  can  express 

U' (P)  in  terms  of  U,(P).  However,  p,  must  be  left  as  an  unknown. 

^  212 
The  equations  that  result  in  this  particular  example  are  as  follows.  Since  s'  - r  s  = 

,  11 

s'  - - ►  s^,  we  may  write 

=  ^^(2  +  4r^  +  5r^^)  +  r^(5  +  a^) 
b^  =  +  Sr^)  +  (5  +  b^) 

=  a_  =  q^(l  +  r^)  +  r^(2  +  a^) 
b;^  =  b_  =  2  +  b^ 

No  equations  involving  P  are  required  and,  in  fact,  P_  is  not  unique.  On  the  other  hand,  U^(P) 
is  equal  to  U|(P)  only  at  P^.  Therefore, 

a+P+  +  b^(l  -  P^)  =  a;p^  +  b;(l  -  P^)  , 

i 

where  P^ - -  Pq,  or 

p  =  _ ^0 _ 

+  Po  +  (l-Po)ri  • 

Again  Eq.  (4-12)  must  be  used,  and  Eq.  (4-9)  may  be  expressed  in  the  form 
b_  -  a_  =  a^  -  b^  =  P3 

The  solution  reveals  that  p^  =  1.393.  The  magnitude  of  the  slope  of  s^  proves  to  be  equal  to 
1.973.  Since  this  value  is  greater  than  p^,  no  contradiction  arises  and  our  guess  was  correct. 

The  form  of  the  payoff  functions  in  the  fourth  strategy  interval  must  therefore  be  that  shown  in 
Fig.  9(d). 

The  values  of  p^,  p^,  p^,  .  ■  .  .  Pp,  given  in  Table  II.  indicate  the  range  over  which  each  of 
the  chain  diagrams  of  Fig.  10  is  valid  when  r^  =  0.512  and  r^  =  0.4096.  The  maximum  payoff 
U(P*)  is  also  included  for  each  pj.  These  payoffs  indicate  the  manner  in  which  the  value  of  the 
game  decreases  as  p  increases  from  zero  to  Pp.  Note  that  as  p  gets  close  to  Pp  the  value  de¬ 
creases  very  slowly. 


CHAPTER  5 

THE  SEARCHER'S  GOOD  STRATEGY 


5.1  INTRODUCTION 

In  the  last  chapter,  the  evader's  good  strategy  was  developed  by  assuming  that  his  optimum 
strategy  in  the  modified  game  was  indeed  his  good  strategy  in  G.  In  the  process,  we  found  that 
he  could  guarantee  a  payoff  equal  to  U(P)  if  he  selected  P  initially,  and  that  he  could  guarantee 
a  maximum  payoff  U(P*).  The  searcher's  optimum  strategy,  which  limited  the  evader  to  the 
above  payoff  in  the  modified  game,  proved  to  be  quite  similar  to  that  in  F°°.  In  fact,  we  were 
able  to  solve  G  completely  when  p  was  prohibitive  because  the  game  degenerated  to  a  form  ef¬ 
fectively  the  same  as  that  of  G  . 

In  this  chapter,  we  shall  develop  the  searcher's  good  strategy  in  G  when  the  moving  cost  is 
not  prohibitive.  It  will  be  shown  that  the  searcher  can  limit  the  evader  to  U(P)  if  he  knows  only 
the  initial  P  that  the  evader  selects  and  no  more.  This  statement  applies  even  if  the  evader 
knows  the  strategy  used  by  the  searcher.  Once  the  evader  has  been  thus  limited,  the  solution 
can  be  extended  to  G,  where  even  the  initial  P  is  unknown,  in  much  the  same  way  as  it  was  in 
G°° .  This  good  strategy  will  limit  the  evader  to  the  payoff  U(P*). 

The  actual  computation  of  the  searcher's  good  strategy  will  prove  to  be  fairly  easy  because 
this  good  strategy  is  strongly  related  to  the  function  U(P)  and  the  chain  diagram  utilized  in  com¬ 
puting  it.  Most  of  the  work  has  been  done  once  U(P)  has  been  found.  As  we  shall  see,  the 
searcher's  good  strategy  will  be  Markovian  in  form.  Therefore,  it  is  appropriate  to  examine 
some  basic  properties  of  Markovian  search  strategies  before  considering  the  relationship  be¬ 
tween  the  searcher's  good  strategy  in  G  and  his  optimum  strategy  in  F. 

5.2  MARKOVIAN  SEARCH  STRATEGIES  AND  MODIFIED  GAMES  H  AND  H' 

In  this  section  we  shall  consider  search  strategies  that  generate  a  search  sequence  by  means 
of  a  discrete-time  Markov  process.  Such  a  process  is  a  mathematical  model  defined  by  a  set  of 
states,  a  set  of  transitions  between  these  states,  and  an  associated  set  of  transition  probabilities. 
Given  a  particular  state,  a  transition  will  occur  in  the  next  time  interval  to  some  other  state,  or 
possibly  to  the  same  state,  according  to  the  set  of  transition  probabilities  associated  with  that 
state.  A  search  sequence  can  be  generated  by  such  a  process  if  a  particular  look  is  associated 
with  each  transition  and  a  probability  distribution  for  selecting  a  starting  state  is  defined. 

Discrete-time  Markov  processes  have  most  often  been  used  to  model  the  behavior  of  a  physi¬ 
cal  system.  In  such  a  situation,  each  state  is  defined  by  a  particular  set  of  values  for  a  set  of 
variables  that  completely  characterize  the  system  at  any  given  time.  As  a  result,  the  primary 
interest  usually  focuses  on  these  states.  For  exampje,  one  may  wish  to  calculate  the  probability 
that  the  system  will  be  in  a  particular  state  after  k  units  of  time  if  it  is  originally  in  a  known 
state. 

When  Markov  search  strategies  are  considered,  however,  the  primary  interest  shifts  to  the 
looks  and  hence  to  the  transitions,  for  the  Markov  process  is  used  strictly  as  a  device  for  gen¬ 
erating  a  search  sequence.  Any  discrete-time  Markov  process  may  be  used  once  a  look  is  asso¬ 
ciated  with  each  transition,  and  we  need  not  consider  the  physical  significance  of  any  state.  As 
we  shall  find  in  Sec.  5.5,  each  state  in  the  process  defined  by  the  searcher's  good  strategy  will 
have  some  significance.  It  is  not  appropriate  at  this  point,  however,  to  concern  ourselves  with 
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this  problem.  The  only  restrictions  which  will  be  imposed  on  the  Markov  processes  are  that  the 
number  of  states  must  be  finite  and  that,  at  most,  one  transition  from  each  state  can  be  asso¬ 
ciated  with  a  given  look.  Since  there  are  only  two  boxes,  there  can,  of  course,  be  at  most  two 
transitions  from  each  state. 

A  simple  example  of  a  Markovian  search  strategy  that  obeys  these  constraints  is  defined  by 
a  transition  diagram  (Fig.  12),  a  set  of  transition  probabilities  and  the  probability  distribution 
^0  “  used  to  select  the  starting  state.  In  contrast  to 


I.  ,jin  n-t»  «ml 


Fig.  12.  The  transition  diagram  of  a  Markovian 
search  strategy. 


the  usual  convention  in  which  p. .  is  used  to  represent  the  probability  of  a  transition  to  o*.,  given 

J  J 

(T.,  here  yj^(k)  is  used  to  represent  the  probability  that  box  k  is  examined  next,  given  cTj.  The 

term  will  be  used  to  represent  the  state  that  follows  when  this  event  occurs. 

The  above  transition  diagram  exhibits  several  properties  worth  noting.  First,  can  be 
occupied  only  at  the  beginning  of  the  process,  since  after  the  first  look  no  transitions  can  be  made 
into  it.  Therefore  is  a  special  example  of  a  transient  state.  In  general,  a  state  will  be  a 
transient  state  if  the  probability  that  it  can  be  occupied  approaches  zero  as  the  process  continues 
indefinitely.  Clearly,  cr^,  and  are  not  transient  states  and,  in  fact,  belong  to  a  single  re¬ 
current  chain.  A  recurrent  chain  consists  of  a  set  of  states  in  which  it  is  always  possible  to  get 
from  one  to  any  other  by  a  series  of  transitions.  Once  a  state  belonging  to  a  recurrent  chain  is 
entered,  only  states  belonging  to  that  chain  can  be  occupied  in  the  future.  Furthermore,  once 
this  has  occurred,  the  probability  that  each  of  the  states  in  the  chain  is  occupied  as  the  process 
continues  indefinitely  approaches  a  nonzero  limiting  value.  It  will  develop  that  Markov  processes 
with  only  one  recurrent  chain  will  be  sufficient  in  our  study. 

The  final  property  which  we  should  note  is  that  only  one  transition  can  occur  from  o-^  and  that 
the  next  look  associated  with  this  state  is  deterministic.  A  state  of  this  type  will  be  called  a  pure 
state.  States  from  which  more  than  one  transition  is  possible  will  be  called  mixed  states. 

In  order  to  discuss  the  influence  that  such  a  Markovian  search  strategy  has  on  the  behavior 
of  the  search  evasion  game,  it  is  helpful  to  introduce  the  modified  games  H  and  H'.  These 
games  are  similar  to  the  modified  games  F  and  F',  but  here  we  reverse  things  and  require  the 
searcher  to  reveal  part  of  his  search  strategy  to  the  evader.  In  particular,  he  must  reveal  the 
transition  diagram  and  the  associated  transition  probabilities  that  he  uses  and  must  tell  the  evader 
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which  state  is  initially  selected  once  the  evader  has  hidden.  He  is  not  required,  however,  to 
reveal  the  probability  distribution  Yq  used  for  this  selection,  and  for  the  time  being  we  shall  not 
concern  ourselves  with  it.  In  the  same  manner  as  before,  H  applies  when  the  evader  still  has 
an  opportunity  to  move  before  the  next  look,  and  H'  applies  after  this  opportunity  has  passed. 

Payoff  functions  can  be  associated  with  each  of  these  games,  but  now  a  different  pair  must 
always  be  associated  with  each  state  since  both  players  are  aware  of  the  state  that  applies  at 
any  given  time.  The  quantity  Wj(P)  will  be  used  to  represent  the  future  payoff  that  applies  in  H 
if  the  search  process  is  in  and  if  the  evader  is  in  box  1  with  probability  P  and  uses  an  opti¬ 
mum  strategy  in  the  future.  The  quantity  W!(P)  will  be  used  to  represent  the  corresponding  pay¬ 
off  in  H'.  No  statement  concerning  the  searcher's  future  strategy  is  included  in  these  definitions, 
since  it  is  completely  specified  by  the  Markov  process. 

Although  the  searcher  is  no  longer  informed  of  the  value  of  P  that  applies  at  any  given  time 
and  the  evader  always  knows  exactly  where  he  is,  these  payoffs  are  still  functions  of  P.  This 
variable  is  the  one  that  an  observer  would  use  to  define  the  evader's  position  if  he  knew  both 
players'  strategies  and  was  able  to  observe  the  search  sequence  which  resulted.  This  assumes, 
of  course,  that  the  observer  cannot  see  when  the  evader  actually  moves.  In  these  games,  the 
evader's  moving  strategy  may  now  be  a  function  of  the  search  state  as  well  as  a  function  of  P 
and  his  own  position. 

A  pair  of  functional  equations  can  be  written  to  express  the  payoffs  associated  with  H  in 
terms  of  those  associated  with  H'  and  vice  versa.  In  H',  the  searcher's  next  look  is  completely 
specified  by  the  Markov  process.  Given  cTj,  he  will  look  into  box  1  with  probability  y.(l)  and  into 
box  2  with  probability  yj(2).  Therefore, 

W!(P)  =  1  +  y.(l)[Pr,  +  1  -  P)  W.  I , 

^  y.(2)[P  Ml  -  P)  r^l  W.|  3  [pTJT-  P)  r  J  ’  <5-l) 

Here,  Wj|j^(P)  represents  the  payoff  in  H  associated  with  the  state  that  follows  cr.  if  box  k  is 
examined.  If  cr.  is  a  pure  state,  the  above  equation  will  of  course  degenerate  to  a  simpler  form. 

In  H,  the  evader  has  the  opportunity  to  move.  As  before,  the  cost  function  C(P  -*•  P')  is 
associated  with  a  transformation  of  the  state  variable.  Since  the  evader  can  calculate  the  pay¬ 
offs  {W!(P)}, 

W.(P)=max{-p|P-P'|  +W!(P')} 

c  pi  1 

Each  payoff  function  W!(P)  must  be  linear  and  is  valid  for  all  P  in  (0,  1).  It  follows  that 

f  dW'(P) 

|-HlP+W!(0)  ,  dp  <  -  M- 

dW’.(P) 

1^  ^  ^  H- 

dW!(P) 

-p(l  -  P)  +  W!U)  ,  -  ^^p  >p  .  (5-2) 


W.(P)  =  j  W!(P) 
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When  the  slope  of  W!(P)  is  greater  than  p,  the  evader  must  move  to  box  1  if  he  is  in  box  2,  and 
when  the  slope  is  less  than  —  p  he  must  move  to  box  2  if  he  is  in  box  1.  On  the  other  hand,  if 
|dW!(P)/dP|  is  strictly  less  than  p,  the  evader  should  not  move,  and  Wj(P)  is  equal  to  W!(P). 
Wj(P)  is  also  identical  to  W!(P)  when  |dW!(P)/dP)  =  p,  but  here  the  evader  can  still  move.  If 
d’W!(P)/dP  =  p,  the  evader  can  move  to  box  1  if  he  is  in  box  2  with  any  probability  and  can  there¬ 
fore  increase  P  by  any  desired  amount.  The  reverse  holds  when  dW!(P)/dP  =  — p.  This  prop¬ 
erty  is  very  important,  for  it  allows  the  evader's  optimum  strategy  in  P  to  be  consistent  with 
his  optimum  strategy  here  when  the  searcher  uses  the  correct  Markov  process.  The  three 
possible  ways  in  which  the  payoffs  W.(P)  and  W'.(P)  can  be  related  to  each  other  are  illustrated 
in  Fig.  13.  Here,  Wj(P)  is  indicated  by  a  broken  line  if  it  is  unequal  to  W!(P). 

Once  the  Markov  process,  except  for  the  starting  rule  Yq,  is  specified,  the  evader's  optimum 
strategy  may  be  obtained  by  using  a  form  of  linear  programming.  Such  a  solution  will  maximize 
Wj(P)  for  all  P  in  each  <7..  Although  we  do  not  need  to  concern  ourselves  with  the  manner  in 
which  such  a  solution  can  be  obtained,  it  is  worthwhile  to  discuss  some  properties  implied  by 
the  result. 


p 

Fig.  13.  The  relation  between  Wj(P)  and  W!(P). 


W,(P) 

•,(Pi 

•jtp) 

WjtPI 


Fig.  14.  A  possible  set  of  payoff  functions 
for  the  strategy  shown  in  Fig.  12. 


Let  us  suppose  that  the  Markov  process  of  Fig.  12  yields  the  solution  shown  in  Fig.  14.  Let 
us  assume  that  dW^(P)/dP  =  — p  and  that  dWj(P)/dP  >  p.  In  both  cr^  and  (7^,  dWj(P)/dP  =  p.  In 
however,  the  evader  must  move  to  box  1  if  he  is  in  box  2,  whereas  in  (7^  he  can  decrease  P 
by  any  desired  amount.  In  er^  ‘’'3>  other  hand,  he  should  not  move.  Note  that  in  gen¬ 

eral  I  dWj(P)/dP  I  p.  As  long  as  is  unknown  to  the  evader,  he  should  initially  hide  in  box  1 
with  probability  P*  since  this  guarantees  the  maximum  payoff  of  W^fP*)  =  W^fP*).  Of  course, 
if  the  searcher  started  the  Markov  process  in  <7^  or  cr^  the  evader  would  receive  more.  The 
searcher  would  be  foolish  to  do  this,  however,  for  there  exists  a  Yq  =  (0,  y^fcr^),  0)  which 

limits  the  evader  to  the  above  amount. 

Unfortunately,  such  a  solution  does  not  guarantee  that  this  is  the  searcher's  good  strategy 
in  G,  for  we  have  no  reason  to  assume  that  the  transition  probabilities  or  the  transition  diagram 
is  correct.  We  can  be  sure  that  such  a  strategy  is  the  good  strategy  only  if  it  limits  the  evader 
to  U(P*).  Clearly,  it  would  be  a  formidable  task  to  guess  the  transition  probabilities,  let  alone 
the  transition  diagram  associated  with  the  good  search  strategy,  if  we  did  not  have  some  guide¬ 
lines  to  help  us  on  our  way.  For  this  reason  the  evader's  good  strategy  has  been  developed  first. 
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Before  we  consider  how  the  good  Markovian  search  strategy  is  related  to  the  behavior  of  game 
F,  however,  a  fundamental  property  of  good  strategies  should  first  be  discussed. 

5.3  AN  IMPORTANT  PROPERTY  OF  GOOD  STRATEGIES 

In  general,  a  two-person  zero-sum  game  has  a  value,  and  a  pair  of  good  strategies  exists, 
if  each  player  can  guarantee  that  he  will  receive  a  payoff  no  worse  than  the  value.  The  strategy 
that  guarantees  this  payoff  is  the  player's  good  strategy,  and  each  good  strategy  is  optimum 
against  the  other.  If  one  player  tells  the  other  that  he  is  using  his  good  strategy,  the  other  player 
can  gain  no  advantage  by  using  a  strategy  different  from  his  good  strategy. 

This  behavior  is  quite  different  from  that  associated  with  any  other  pair  of  strategies.  If 
one  player  were  to  use  an  arbitrary  one  and  inform  the  other  of  what  it  was,  the  other  player 
could  also  use  a  different  strategy  and  collect  a  larger  payoff.  If  in  turn,  he  told  the  first  player 
what  this  new  strategy  was,  that  player  would  probably  decide  to  use  a  different  one  himself. 

This  process  can  be  continued  and  leads  to  a  "if  I  do  this,  he  will  do  thus  and  so,  but  then  I  should 
do  something  else,  but  then  he  will . "  type  of  reasoning.  Only  the  good  strategies  avoid  in¬ 

stabilities  of  this  type. 

In  most  games  of  interest  (excluding  perfect  information  games  such  as  chess)  each  player's 
good  strategy  involves  random  decisions.  Such  a  strategy  is  called  a  mixed  strategy  if  the  game 
is  expressed  in  normal  form.  On  the  other  hand,  it  can  be  expressed  in  terms  of  a  set  of  be¬ 
havioral  strategies  as  we  shall  do  here.  As  was  mentioned  earlier,  a  behavioral  strategy  asso¬ 
ciates  with  each  information  set  or  behavioral  state  for  the  player  in  question  a  probability  dis¬ 
tribution  for  selecting  the  next  alternative.  In  general,  the  probability  distribution  associated 
with  a  given  behavioral  state  need  not  include  a  nonzero  probability  for  each  alternative.  An  al¬ 
ternative  that  does  have  a  nonzero  probability  in  a  given  behavioral  state  can  be  called  an  admis¬ 
sible  alternative  of  that  state.  Alternatives  that  occur  with  probability  zero  will  be  called  inad¬ 
missible  alternatives  of  that  state.  This  of  course  holds  only  when  the  number  of  alternatives 
in  each  state  is  finite  as  it  is  in  the  search  evasion  game. 

The  property  of  the  good  strategies  that  we  wish  to  discuss  here  is  as  follows.  If  one  player 
uses  his  good  strategy,  the  payoff  will  be  equal  to  the  value  of  the  game  as  long  as  the  other 
player  selects  only  admissible  alternatives.  That  is,  the  payoff  is  the  same  for  any  set  of  prob¬ 
ability  distributions  over  the  behavioral  states  of  one  player  as  long  as  these  distributions  exclude 
the  selection  of  inadmissible  alternatives  and  the  other  player  uses  his  good  strategy. 

As  an  example,  consider  the  good  strategies  in  G°° .  The  evader's  good  strategy  requires 
him  to  hide  in  box  1  with  probability  P*  and  in  box  Z  with  probability  1  —  P*.  Thus,  hiding  in 
either  box  is  admissible.  The  searcher's  good  strategy,  on  the  other  hand,  requires  him  to 
choose  one  of  the  two  infinite  search  sequences  optimum  at  P*,  and  these  two  sequences  are  his 
admissible  alternatives.  As  long  as  the  evader  uses  his  good  strategy  P*,  the  payoff  will  equal 
the  value  if  either  of  these  two  sequences  is  selected.  Similarly,  the  probability  distribution 
that  the  searcher  uses  to  choose  one  of  these  sequences  causes  the  payoff  to  be  independent  of  P 
and,  therefore,  equal  to  the  value  for  either  of  the  evader's  admissible  alternatives.  Note  that 
in  this  game  the  evader  has  no  inadmissible  alternatives,  whereas  the  searcher  has  an  infinite 
number. 

This  property  of  good  strategies  is  very  useful  when  one  wishes  to  derive  the  good  strategy 
for  one  player  once  the  other's  is  known.  Any  alternative  that  causes  the  payoff  to  be  unequal  to 
the  value  when  the  other  player  uses  his  good  strategy  must  be  an  inadmissible  alternative  for 
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that  behavioral  state  and  can  be  excluded  from  consideration.  Once  this  has  been  done,  the 
problem  of  finding  the  "good"  probability  distributions  over  the  admissible  alternatives  in  each 
state  is  much  simpler. 

In  the  next  section,  we  shall  find  that  this  property  allows  us  to  derive  the  transition  diagram 
associated  with  the  searcher's  good  strategy  in  a  straightforward  manner.  The  associated  transi¬ 
tion  probabilities  can  then  be  computed  by  making  use  of  the  previously  calculated  function  U(P). 
Finally,  the  initial  distribution  Yg  can  be  found  to  complete  the  solution. 

5.4  DERIVATION  OF  THE  SEARCHER’S  TRANSITION  DIAGRAM 

When  a  Markov  process  is  used  to  generate  a  search  sequence,  each  state  in  the  transition 
diagram  is  a  behavioral  state  of  the  searcher's  strategy  and  each  transition  represents  an  alter¬ 
native.  To  start  this  process,  one  of  these  states  must  be  selected  by  means  of  Yg.  A  starting 
state  (Tg,  not  shown  in  the  transition  diagram,  may  be  associated  with  this  distribution.  As  would 
be  expected,  only  some  of  the  states  in  the  transition  diagram  should  be  initially  selected  with  a 
nonzero  probability.  The  selection  of  these  states  corresponds  to  the  admissible  alternatives  in 
o-g.  The  general  form  of  the  transition  diagram  and  also  the  admissible  alternatives  associated 
with  o-g  can  be  found  by  considering  the  behavior  of  game  G  when  the  evader  is  required  to  use 
his  good  strategy.  It  will  be  more  convenient,  however,  to  derive  the  form  of  the  transition 
diagram  first,  and  consider  the  start-up  state  o-g  later. 

In  order  to  do  this,  we  must  modify  slightly  our  restrictions  on  the  evader's  strategy.  In 
G,  the  evader's  good  strategy  contains  two  parts.  First,  he  must  use  P*  to  determine  where 
he  hides  initially,  and  then  he  must  exercise  his  good  moving  strategy  as  the  game  is  played. 

We  can  simplify  things  by  assuming  that  the  initial  P  is  arbitrarily  assigned  and  known  to  the 
searcher.  Once  it  has  been  used,  the  evader  is  required  to  exercise  his  good  strategy  and,  in 
fact,  must  move  before  the  first  look,  if  necessary.  Under  these  conditions,  the  searcher  can 
utilize  the  initial  P  in  starting  the  search  process,  and  we  must  find  a  transition  diagram  with 
which  he  can  limit  the  evader  to  U(P). 


Fig.  15.  A  pair  of  payoff  functions  for  F  and  F'. 


In  order  to  clarify  the  discussion,  let  us  consider  Fig.  15  [the  payoff  functions  in  Fig.  9(c)]  . 
Let  us  also  reproduce  the  associated  chain  diagram  in  Fig.  10(c)  with  the  moving  transitions 
eliminated. 

If  the  evader  were  to  use  his  good  moving  strategy,  this  diagram  could  be  used  to  generate 
a  search  sequence  that  would  yield  a  payoff  equal  to  U(P).  The  process  should  merely  be  started 
in  the  state  cr^  that  corresponds  to  the  interval  in  which  the  assigned  P  lies.  It  follows  that 
the  look  associated  with  each  state  in  the  chain  must  be  admissible.  Unfortunately,  each  state 
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is  a  pure  state  and  the  resulting  search  sequence  would  be  deterministic.  Under  these  conditions, 
the  evader  could  obviously  secure  a  larger  payoff  by  using  a  moving  strategy  other  than  his  good 
one.  Clearly,  we  have  not  found  all  of  the  searcher's  admissible  alternatives. 

In  order  to  determine  when  other  looks  are  admissible,  the  evader's  good  moving  strategy 
must  be  examined  more  closely.  After  a  transition  to  tr^  has  occurred,  P  belongs  to  tt ^  and  the 
evader  must  transform  it  to  P^,  the  left  boundary  of  The  state  rr^  requires  a  look  into  box  1 
and  a  transition  to  occurs.  In  the  process,  P^  is  transformed  to  Pq,  the  left  boundary  of  tt^. 
As  a  result,  a  look  into  either  box  is  admissible,  and  if  box  2  is  examined,  P  will  return  to  tt^. 

It  follows  that  a  look  into  either  box  is  admissible  in  once  the  search  process  has  occupied 
(T^.  A  look  into  box  2  produces  a  transition  to  whereas  a  look  into  box  1  yields  a  transition 
to  (T  as  before.  Similar  reasoning  shows  that  a  look  into  either  box  is  admissible  and  the  same 
transitions  occur  if  the  Markov  process  is  in  o-  ^  and  has  occupied  cr  beforehand.  Since  this 
reasoning  does  not  apply  until  cr^  or  o-  has  been  occupied,  we  must  differentiate  between  these 
two  situations. 

The  transition  diagram  that  takes  this  into  account  is  shown  in  Fig.  16.  In  this  diagram,  the 

t  r  t 

states  (Tj  and  (r.  are  associated  with  each  interval  ir.  in  the  no-niove  region.  State  cr^  is  transient 

and  applies  before  a  move  occurs.  State  belongs  to  the  recurrent  chain  and  applies  thereafter. 

States  cr  and  cr^  also  belong  to  the  recurrent  chain  but  have  no  superscripts  because  there  are 

no  corresponding  transient  states.  These  will  be  called  the  moving  states,  since  the  evader's 

good  moving  strategy  requires  a  transformation  of  the  state  variable  P  in  each  one  of  them.  If 

P  belongs  initially  to  tt^,  the  searcher  should  start  the  search  process  in  the  transient  state  cr^  . 

The  only  mixed  states  in  the  diagram  are  <r_^  and  cr^^, which  are  entered  only  after  the  proper 

moving  state  has  been  occupied.  Any  probability  distribution  over  their  associated  alternatives 

will  produce  the  payoff  U(P)  as  long  as  the  evader  uses  his  good  moving  strategy. 
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This  transition  diagram  is  typical  of  those  which  apply  when  both  P  and  belong  to  the 
interior  of  the  recurrent  region  (Pqj.  ^  this  situation,  both  s  and  s^  are  included  in  the 

chain  diagram  associated  with  game  F.  In  general,  such  a  chain  diagram  must  be  of  the  follow¬ 
ing  form. 


. . . 

I  I 

I  I 

(b—T-<S) . — . <k) 


Here,  of  course,  s  ^  may  be  equivalent  to  s' ,  and  may  be  equivalent  to  s^.  Both  always 

occur,  for  example,  in  the  first  strategy  interval.  As  in  the  previous  example,  each  state  s.  of 

t  r  ^ 

the  no-move  region  is  replaced  by  two  search  states  (t.  and  cr.  .  The  recurrent  chain  is  identical 
to  the  above  chain  diagram  except  that  each  pair  of  moving  states  in  the  chain  diagram  is  replaced 
by  a  single  moving  state.  Each  transient  state  is  connected  to  a  or  in  exactly  the  same  man¬ 
ner  that  s.  is  connected  to  s  or  s,.  In  both  cr*\  and  a  look  into  either  box  is  admissible.  A 
look  into  box  1  produces  a  transition  to  o-  and  a  look  into  box  Z  produces  a  transition  to  o-^. 

ASSOCIATED  . 

PAST  SEQUENCES 
o  -  1 


a,  -  11 
.,-12 
.[-21 

1211 
1212 
-  1121 
.[-  2121 
.^-2112 

121211 
121212 
.^2“  121121 
112121 
o[-  212121 
.2-212112 
.3-211212 

Fig.  17(a-d).  The  transition  diagrams  associated  with  the  chain  diagrams  of  Figs.  10(a-d). 
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The  transition  diagrams  associated  with  the  chain  diagrams  in  Figs.  10(a)  through  (d)  are 
shown  in  Fig.  17.  Note  that  each  transient  chain  is  used  to  bypass  the  mixed  states  until  one  of 
the  moving  states  has  been  entered. 

It  is  not  surprising  to  find,  in  each  of  these  diagrams,  that  a  finite  part  of  the  past  search 
sequence  uniquely  determines  the  recurrent  state  in  which  the  process  must  be.  This  should 
have  been  expected,  since  each  state  is  a  behavioral  state  of  the  searcher's  strategy  and  must 
have  a  corresponding  information  set.  The  above  property  holds  for  any  transition  diagram  that 
can  be  associated  with  a  good  search  strategy.  In  Fig.  17(a),  each  state  is  defined  by  the  last 
look;  u  applies  if  the  last  look  was  made  into  box  1  and  applies  if  it  was  made  into  box  2.  In 
Figs.  17(b)  through  (d),  the  past  sequence  of  the  last  two,  four,  and  six  looks,  respectively,  is 
required  to  determine  uniquely  the  recurrent  state  that  the  process  must  occupy.  Note  that  not 
all  the  possible  past  sequences  of  a  given  number  of  looks  have  corresponding  states  in  the  dia¬ 
gram.  This  is  true  because  the  pure  states  cause  some  sequences  to  be  inadmissible.  For  ex¬ 
ample,  in  each  of  these  latter-  diagrams  it  is  inadmissible  to  make  two  consecutive  looks  into 
box  2.  Naturally,  the  finite  past  sequence  associated  with  each  recurrent  state  is  valid  only  when 
the  search  process  has  generated  the  required  minimum  number  of  looks. 

When  only  one  of  the  bounding  points  of  the  no-move  region  belongs  to  the  interior  of  the  re¬ 
current  region,  the  transition  diagram  is  only  slightly  different.  As  an  example,  consider  the 
chain  diagram  in  Fig.  10(e): 
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In  this  case,  extends  into  the  recurrent  region  and  s^  and  are  included  in  the  chain  diagram. 
Once  a  move  occurs  in  the  transition  from  s^  to  s^.  the  point  P  will  always  be  at  the  left  boundary 
of  each  interval  tt..  As  a  result,  P  will  be  equal  to  Pp  when  it  is  associated  with  th  interval 
7r^,  and  a  look  into  either  box  will  be  admissible  in  cr^.  Point  P  will  no  longer  be  equal  to  P^ 
when  it  belongs  to  tt  ,  because  the  moving  state  s  does  not  occur  in  the  chain  diagram. 

The  searcher's  transition  diagram  is  shown  in  Fig.  18.  Here,  <7^  is  the  only  transient  state, 
for  all  of  the  other  states  transform  into  before  reaching  the  single  mixed  state  cr^ .  This,  of 
course,  does  not  always  occur.  In  general,  any  other  interval  tt.  that  transforms  into  before 
reaching  tt  ^  (and  hence  tt^)  requires  a  transient  state  in  addition  to  a  recurrent  state  uf.  In 


Fig.  18.  Transition  diagram  associated 
with  the  chain  diagram  of  Fig.  10(e). 
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such  a  situation,  a}  will  be  connected  by  a  series  of  transitions  to  a}  in  the  same  manner  as 

1 

r  I*  * 

a.  is  connected  to  a.  .  As  before,  if  P  initially  belongs  to  n.,  the  search  process  should  start 

1  t  ^  1 

in  (Tj  if  such  a  state  exists.  If  it  does  not  exist  the  process  should  start  in  the  unique  state  o-., 
which  is  associated  with  tTj  and  belongs  to  the  recurrent  chain. 

5.5  CALCULATION  OF  THE  GOOD  PROBABILITY  DISTRIBUTIONS  ASSOCIATED 

WITH  EACH  MIXED  STATE 

Once  the  correct  transition  diagram  for  the  Markov  process  is  determined,  the  good  proba¬ 
bility  distributions  associated  with  each  mixed  state  must  be  calculated.  We  have  seen  that  a 
payoff  equal  to  U(P)  will  result  for  any  set  of  probability  distributions  as  long  as  the  process  is 
started  in  the  correct  state  and  the  evader  uses  his  good  moving  strategy.  The  good  probability 
distributions  that  we  now  seek  must  limit  the  evader  to  this  payoff  even  if  he  is  no  longer  re¬ 
quired  to  use  his  good  strategy.  We  must  still  require  him  to  reveal  the  initial  value  of  P  to  the 
searcher.  As  would  be  expected,  the  modified  games  H  and  H'  will  be  of  use  once  we  add  this 
constraint. 

In  order  to  avoid  confusion,  let  us  first  consider  the  case  where  both  P  and  P^  belong  to 
the  interior  of  the  recurrent  region  and  use  our  standard  example,  i.e.,  the  chain  diagram  asso¬ 
ciated  with  the  modified  games  F  and  F'  and  the  transition  diagram  associated  with  the  modified 
games  H  and  H'  where  the  initial  P  is  known  (Fig.  19). 

))->l  «l40  I 

1  t 
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As  we  have  seen,  the  search  process  should  start  in  cr?  if  the  initial  P  belongs  toir^,  in  <7^ 
if  it  belongs  to  and  in  a  if  it  belongs  to  tt  .  Since  the  evader  is  no  longer  required  to  use 
his  good  strategy,  it  should  be  clear  that  the  good  probability  distributions  associated  with 
and  (j^  must  insure  that  W^(P)  =  U.(P)  for  each  cr5,  that  W  (P)  =  U  (P),  and  that  W^(P)  =  U^(P). 

It  should  be  recalled  that  Wj(P)  and  W!(P)  were  defined  as  the  payoffs  that  result  if  the  evader 
uses  an  optimum  future  strategy.  The  evader's  optimum  strategy  has  more  freedom  than  his 
good  strategy  because  he  knows  the  strategy  used  by  the  searcher.  If  the  searcher  does  not  use 
his  good  strategy,  the  evader  can  capitalize  on  this  error. 

It  is  easily  shown  that  the  payoff  Wj  (P)  associated  with  each  transient  state  is  identical  to 
the  corresponding  payoff  Uj(P)  as  long  as  W  (P)  =  U  (P)  and  W^(P)  =  U^(P).  If  the  search  proc¬ 
ess  starts  in  a  transient  state,  only  transient  states  are  occupied  until  <t  or  o-^  is  entered.  Un¬ 
til  this  occurs,  all  looks  are  deterministic,  and  the  resulting  sequence  is  the  same  as  that  which 


Fig.  19.  Chain  and  transition  diagrams  associated 
with  the  strategy  interval  of  Fig.  15. 
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transforms  the  equivalent  state  Sj  and  s^  or  s_.  Therefore,  any  Vf^iP)  must  be  related  to  W  (P) 
or  W^(P)  as  far  as  the  look  sequence  is  concerned  in  exactly  the  same  manner  as  U.(P)  is  re¬ 
lated  to  U^(P)  or  U_(P).  Each  payoff  Uj^{P)  is  appropriate  to  an  interval  tt.  that  belongs  to  the 
no-move  region  where  Uj(P)  =  U!(P)  and  |dU!(P)/dr|  <  p.  Therefore,  as  long  as  W  (P)  =  U  (P) 
andW-(-(P)  =  U+(P),  moving  cannot  be  optimum  for  the  evader  in  any  transient  search  state  either, 
and  Wj  (P)  =  W!^P)  =  Uj(P)  =  U!(P). 

In  contrast  to  this  behavior,  some  moves  must  be  admissible  in  a  and  cr^,  since  the  search¬ 
er's  good  strategy  must  allow  the  evader  to  choose  any  of  the  admissible  alternatives  associated 
with  his  good  strategy  at  no  loss.  The  state  cr  corresponds  to  s  where  the  evader  increases 
P  to  P  .  As  a  result,  both  moving  from  box  Z  to  box  1  and  remaining  in  the  same  box  must  be 
admissible  alternatives  in  a  .  The  function  W  (P)  must,  therefore,  have  a  slope  equal  to  +p  and 
be  identical  to  W  (P).  Similar  reasoning  can  be  used  to  show  that  the  slope  of  W^(P)  must  be 
equal  to  — p  and  that  W^(P)  =  W^(P). 

The  necessary  and  sufficient  condition  that  the  searcher's  good  strategy  must  satisfy  when 
the  initial  P  is  known  should  now  be  clear.  A  pair  of  probability  distributions  Y  ^  =  (y  ^(1). 
y  ^(2))  and  =  (y_|_^(l),  y^^(2))  must  be  found  that  causes  W  (P)  and  W^(P)  to  equal  U  (P)  and 
U^(P),  respectively.  If  this  occurs,  W  (P)  and  W^(P)  will  also  be  equal  to  the  associated  pay¬ 
offs  of  game  F.  Such  a  condition  insures  that  each  payoff  associated  with  a  transient  state  will 
equal  the  corresponding  payoff  Uj^(P)  and  that  the  searcher  will  be  able  to  limit  the  evader  to 
U(P)  at  the  beginning  of  the  game. 

Before  considering  the  actual  computation  of  the  "good''  probability  distributions,  let  us 
show  that  they  exist.  The  payoff  functions  U(P)  and  U'(P)  for  our  example  have  the  general  ap¬ 
pearance  shown  in  Fig.  15.  For  the  moment,  let  us  assume  that  W  (P)  =  U  (P)  and  W^{P)  = 

U^(P)  and  that  moving  occurs  only  in  a  and  Feedback  occurs  in  the  recurrent  chain  of  the 
transition  diagram  and  these  assumptions  will  be  correct  if  they  are  not  contradicted  by  this 
feedback.  If  we  set  yj(l)  equal  to  one,  then  <r^  -*  cr  in  exactly  the  same  manner  as  s^  -•  s  .  In 
this  case,  W]|^(P)  =  U^(P)  ^  U^(P),  a  contradiction.  On  the  other  hand,  consider  what  happens 

if  yi(i)  =  iitis  example,  ir,  is  the  interval  immediately  to  the  left  of  P,.  Therefore, 

^  ^  r  2 

-*  IT  ^  in  exactly  the  same  manner  as  -*  jt^.  As  a  result,  when  y^(l)  =  0,  -  cr^ 

as  s^  s  — *  s^.  It  follows  that  W^(P)  =  U2(P).  This  is  again  a  contradiction.  The  functions 

U2(P),  U^(P)  and  U^(P),  however,  all  intersect  at  P^  and  have  slopes  greater  than,  equal  to,  and 
less  than  — p,  respectively.  Therefore,  there  must  exist  a  y^(l)  where  0  ^  y^(l)  1  for  which 

the  slope  of  W^(P)  equals  — p.  Since  WY(P)  must  also  intersect  the  above  functions  at  P^,  it  must 
be  identical  to  U^(P)  when  this  occurs.  The  function  W^(P)  is  then  equal  to  W^(P)  and  no  contra¬ 
diction  results.  In  a  similar  manner,  the  slope  of  W  (P)  must  change  from  that  associated  with 
U  (P)  to  that  associated  with  U  ^(P)  as  y  ^(2)  goes  from  one  to  zero.  Therefore,  there  exists 
a  y  ^(2)  where  0  ^  y  ^(2)  1  for  which  W  (P)  =  U  (P)  =  W_(P). 

The  above  argument  was  developed  by  assuming  that  moving  could  occur  only  in  cr  and  o-^. 
Actually,  the  result  is  valid  as  long  as  the  evader  can  gain  no  advantage  by  moving  in  any  re¬ 
current  state,  that  is,  as  long  as  |dW!*^(P)/dP|  <  p.  for  each  0-^.  If  this  occurs,  W!^(P)  =  W.^(P) 
for  each  crF,  and  the  payoffs  associated  with  each  state  in  the  transition  diagram  are  consistent. 

It  can  be  shown  that  this  final  requirement  is  always  satisfied  by  the  probability  distributions 
derived  in  the  above  manner,  and,  therefore,  that  they  yield  the  searcher's  good  strategy.  The 
proof,  however,  is  somewhat  involved.  Since  it  is  not  particularly  illuminating,  it  has  been  put 
in  Appendix  C. 
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In  order  to  illustrate  the  manner  in  which  the  good  probability  distributions  can  be  calculated, 
let  us  again  refer  to  the  transition  diagram  in  Fig.  19.  In  the  usual  manner,  we  can  let 

W.^(P)  =  W^^(P)  =  a.’^P  +  bj^(l  -  P)  , 

W*(P)  =  W!^(P)  =  a.P  +  bj(l  -  P) 

W  JP)  =  w;^(P)  =  a_P  +  b_(l  -  P) 

W,^(P)  =  W1  (P)  =  a  .  P  +  b  ,  (1  -  P) 

Here,  the  coefficients  associated  with  the  payoffs  of  the  transient  states  and  the  two  moving  states 
have  no  superscripts  and  are  identical  to  those  associated  with  U(P).  It  follows  from  Eq.  (5-1) 
that 


=  y^(l)  (1  +  +  y^(^)  (1  +  a+) 

=  y^(l)  (1  +  b_)  +  yj(2)  (1  +  r^b^) 

Since  the  transition  from  <7^  to  involves  deterministic  looks  only,  Eq.  (4-8)  can  be  used  to, 
express  a^  in  terms  of  a.^  and  b^  in  terms  of  b^.  In  this  example,  and  we  find  that 

+  fj(l  +  a^’') 
b^  =  1  +  b^-^  . 

In  these  equations,  the  only  unknowns  are  y^(l),  y^(2),  a^  and  b^.  Since  y^(l)  +  yj(2)  =  1,  one 
of  the  four  equations  is  redundant.  As  long  as  the  solution  of  game  F  is  correct,  however,  no 
contradiction  will  result.  Since  the  equations  are  linear,  =  (yj(l),  y^(2))  is  easily  calculated. 
The  result  when  r^  =  0.512,  r^  =  0.4096  and  p  =  1.3  is  Y^  =  (0.4334,  0.5666). 

21  r 

The  quantity  Y^  can  be  calculated  in  the  same  manner  by  noting  that  a  - -  cr  There¬ 

fore, 

a_  =  qj(2)  +  r^(2  +  aj^) 
b_  =  q2(l)  +  r^(z  +  bj^) 

while 


=  y.^d)  (1  +  I'^a_)  +  y_i(2)  (1  +  a^) 
bj^  =  y_i(l)  (1  +  b_)  +  y_.^(2)  (1  +  r,b^) 

The  solution  in  the  numerical  example  is  Y  ^  =  (0.1575,  0.8425).  It  should  not  come  as  a  sur¬ 
prise  to  find  that  y  ^(2)  ^y^(Z)  and  that  y  ,^(1)-$  y^(l).  This  is  true  in  general. 

In  Fig.  20,  the  payoff  associated  with  each  of  the  states  in  the  transition  diagram  is  graphed 
for  this  numerical  example.  Those  associated  with  the  transient  states  and  the  moving  states 
are  shown  by  solid  lines,  and  those  associated  with  the  other  recurrent  states  are  shown  by  broken 
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Fig.  20.  The  payoff  functions  associated  with  the  Jb 
states  in  the  transition  diagram  shown  in  Fig.  19  * 

(r,  =0.512,  r2  =  0.4096,  m  =  1.3). 


line.s.  Here,  of  course,  each  payoff  is  valid  for  all  P,  and  there  is  no  need  to  differentiate  be¬ 
tween  a  pair  of  payoffs  for  H  and  H'  since  they  are  identical.  The  lower  bound  of  this  ensemble 
of  functions  forms  the  payoff  U(P).  This  follows  from  the  fact  that  the  searcher  can  limit  the 
evader  to  U(P)  when  P  initially  belongs  to  ir.  only  by  starting  the  search  process  in  Finally, 
it  should  be  noted  that  W*j(P)  =  W^j(P)  and  W*(P)  =  W^''(P)  at  P^,  while  W*(P)  =  wJ(P)  at  P^. 
This  is  true  because  these  are  the  respective  values  of  P  that  apply  in 
the  evader  uses  his  good  strategy. 

When  only  one  of  the  bounding  points  of  the  no-move  region  belongs  to  the  interior  of  the  re¬ 
current  region,  the  searcher's  good  strategy  can  be  derived  in  much  the  same  manner.  Only 
one  probability  distribution  is  required,  however,  for  there  is  only  one  mixed  state  as  well  as 
a  single  moving  state.  In  the  transition  diagram  of  Fig.  18,  the  payoffs  associated  with 
a  y,  CT,  and  cr  ,  are  each  identical  to  the  corresponding  U,  (P)  in  game  F,  since  each  of  these 
states  is  transformed  into  <t^  before  reaching  the  mixed  state  cr^  .  The  function  (P)  will  of 
course  be  unequal  to  U^(P).  In  general,  any  other  recurrent  state  that  transforms  into  the 
mixed  state  will  have  an  associated  transient  state  and  W^{P)  ^  W? (P)  =  Uj^(P).  Appendix  C 
shows  that  |dW!^(P)/dP|  p  for  each  state  of  this  type  also. 


5.6  COMPLETION  OF  THE  SEARCHER'S  GOOD  STRATEGY 
WHEN  INITIAL  P  IS  UNKNOWN 

Now  that  we  have  seen  how  the  searcher  can  limit  the  evader  to  U(P)  when  he  knows  the 
evader's  initial  choice  of  P,  we  must  extend  this  strategy  to  the  actual  game,  where  initial  P 
is  unknown.  Clearly,  the  searcher  can  no  longer  limit  the  evader  to  U(P)  given  any  P.  This  is 
not  necessary,  however,  for  we  know  that  the  evader  can  guarantee  himself  U(P*).  As  long  as 
we  can  find  a  search  strategy  that  limits  him  to  this  payoff,  U(P*)  must  be  the  value  of  the  game 
and  the  searcher  has  a  complete  good  strategy. 
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All  that  remains  to  complete  the  solution  is  the  computation  of  the  starting  rule  Yq  for  the 
Markov  process  that  generates  the  search  sequence.  No  look  is  associated  with  this  starting 
rule,  and  is  derived  in  exactly  the  same  manner  as  the  searcher's  starting  rule  in  G°°.  If 
the  evader  is  to  guarantee  a  payoff  equal  to  U(P*),  he  must  initially  hide  with  probability  P*. 

Therefore,  if  P*  is  the  breakpoint  that  separates  jr,  from  tt  .,  a  choice  of  either  cr}  or  (t}  is  ad- 

t  t  ”  ^ 

missible  for  the  searcher.  Since  W.  (P)  and  W.  (P)  must  be  equal  to  U(P*)  at  P*  and  must  have 

^  ^  t  t 

slopes  of  opposite  sign,  there  must  exist  a  Yq  =  (yp(o-j^ ),  yQ((T^'))  which  insures  that  Wq(P)  = 
U(P*)  for  all  P.  If,  on  the  other  hand,  the  unusual  occurs  and  U(P)  is  a  maximum  over  a  whole 
interval  ttj,  the  searcher's  starting  rule  is  deterministic  and  requires  the  Markov  process  to 
start  in  ctA  Here  again  the  evader  is  limited  to  a  payoff  equal  to  the  maximum  of  U(P).  We 
may  finally  state  with  assurance  that  a  value  exists  for  our  search  evasion  game  and  that  the 
strategies  we  have  developed  for  the  two  players  are  indeed  good  strategies. 

In  the  numerical  example  that  has  been  used  throughout  these  chapters,  P*  occurs  at  Pq  = 

0.538.  Since  W^.  (P)  =  3.747iP  +  2.658(1  -  P)  and  W*(P)  =  '.9744P  +  3.556(1  -  P),  the  starting 
t  t  ^ 

rule  requires  that  yg(o’  =  0.3481  and  yQ(o'^)  =  0.6519.  searcher's  complete  good  strategy, 

illustrated  in  Fig.  21,  limits  the  evader  to  Wg(P)  =  U(P*)  =  3.243. 
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CHAPTER  6 

GENERALIZED  REWARD  STRUCTURE 


6.1  INTRODUCTION 

The  two-box  search  evasion  game  considered  in  Chapters  2  through  5  had  a  very  simple 
reward  structure.  The  evader  simply  received  one  unit  from  the  searcher  each  time  a  look  was 
made  and  paid  him  p  units  each  time  he  moved.  Thus,  the  reward  associated  with  each  look 
was  independent  of  where  the  look  was  made  and  where  the  evader  was  hiding  at  the  time.  Also, 
the  moving  cost  was  not  a  function  of  the  direction  of  the  move. 

We  have  deferred  treating  the  more  general  reward  structure  until  now  because  it  has  al¬ 
lowed  us  to  study  the  behavior  of  the  game  with  a  simpler  notation.  In  this  chapter,  we  shall  ex¬ 
amine  the  two-box  game  with  a  more  general  reward  structure.  Most  of  the  properties  that  have 
been  developed  will  carry  over  directly.  In  fact,  all  the  properties  that  make  the  two-box  search 
evasion  game  interesting  have  already  appeared.  These  properties  arose  because 

(a)  The  searcher  did  not  know  where  the  evader  was  until  he  found  him. 

(b)  The  state  variable  P  was  changed  according  to  Bayes'  rule  by  each 
unsuccessful  look  and  this  transformation  was  a  function  of  the  escape 
probabilities  alone. 

(c)  The  evader  could  move  at  a  cost,  and  the  cost  of  a  transformation  of 
the  state  variable  P  was  proportional  to  the  magnitude  of  the  change 
in  P. 

These  properties  will  still  apply. 

In  the  example  of  revenuer  vs  moonshiner  in  Chapter  1,  we  noted  that  a  reward  of  one  unit 
was  associated  with  each  look.  It  took  the  revenuer  one  time  unit  to  examine  an  area,  and  during 
this  time  the  moonshiner  was  able  to  produce  enough  moonshine  to  secure  one  unit  of  profit.  We 
can  imagine  that  in  a  more  general  situation  it  takes  the  revenuer  different  amounts  of  time  to 
examine  the  various  areas.  Also,  the  moonshiner  may  be  able  to  operate  more  efficiently  in  one 
area  than  in  another.  That  is,  his  earning  rate  may  vary  from  box  to  box.  As  a  result,  the  re¬ 
ward  associated  with  a  given  look  may  depend  on  where  the  look  is  made  and  where  the  moon¬ 
shiner.  or  evader,  is  hiding. 

To  account  for  these  possibilities,  as  well  as  others,  let  us  introduce  the  following  reward 
structure: 

p.  =  evader's  earning  rate  in  box  i  if  the  searcher  is  not  looking 
there; 

T) .  =  loss  in  earning  rate  in  box  i  when  the  searcher  is  looking 
there  (net  earning  rate  =  p.  — 

Tj  =  time  required  to  examine  box  i; 

=  detection  loss  of  box  i. 

1 

In  order  to  make  these  quantities  realistic,  we  shall  require  that  p.,  Tj  >  0;  tj^,  0. 

Our  reward  structure  can  be  interpreted  as  follows.  In  the  event  that  the  searcher  looks 
into  box  j  while  the  evader  is  hiding  in  box  i,  the  evader  receives  the  reward  If.  on  the 

other  hand,  the  searcher  looks  into  box  i,  the  evader  receives  a  reward  of  (pj  —  77j)  The  in¬ 
troduction  of  allows  us  to  consider  examples  in  which  the  evader  cannot  operate  as  efficiently 
when  the  searcher  is  examining  the  box  in  which  he  is  hiding.  To  be  realistic,  should  not 
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exceed  p^;  that  is,  p.  -  rjj  >0.  However,  this  restriction  will  not  be  formally  imposed  for  it  is 
not  necessary  in  the  mathematical  development. 

The  net  reward  associated  with  the  event  in  which  the  searcher  looks  into  box  i  and  finds 
the  evader  need  not  be  equal  to  (p.  —  tj.)  t^.  The  evader  may  be  found  at  the  beginning  of  this 
look.  In  addition,  he  may  suffer  a  penalty  for  being  caught.  For  example,  the  evader  may  be 
sent  to  jail.  We  can  combine  these  losses  in  the  detection  loss.  The  net  reward  associated 
with  the  event  in  which  the  searcher  looks  into  box  i  and  finds  the  evader  is,  therefore, 

(p.  —71.)  T.  —  X.. 

A  final  generalization  that  can  be  applied  to  the  reward  structure  with  little  increase  in  com¬ 
plexity  concerns  the  moving  cost.  We  can  let  the  moving  cost  depend  on  which  move  is  made. 
Since  only  two  boxes  are  considered  here,  only  two  moves  can  occur.  Thus  we  can  let  p.^  repre¬ 
sent  the  cost  associated  with  a  move  to  box  1  from  box  2  and  represent  the  cost  of  the  move  in 
the  reverse  direction.  The  subscripts  of  these  coefficients  correspond  to  those  associated  with 
the  move  probabilities  x^  and  x^’. 

In  order  to  utilize  the  work  of  the  previous  chapters  most  efficiently,  the  two-box  search 
evasion  game  with  the  generalized  reward  structure  will  be  discussed  in  the  same  sequence. 
Those  properties  that  still  hold  will  be  mentioned,  exceptions  will  be  noted,  and  the  new  form 
of  each  of  the  various  equations  that  were  of  use  before  will  be  listed.  To  simplify  the  associa¬ 
tion  of  each  new  form  with  the  old.  each  new  equation  will  be  numbered  as  before  but  will  be  fol¬ 
lowed  by  the  symbol  §. 

6.2  G";  THE  NO -MOVE  GAME 

When  moving  is  not  allowed,  g“  can  be  solved  as  before  by  using  the  modified  game  F“. 

The  function  U°°(P)  is  continuous  and  convex.  It  is  piecewise  linear  under  the  conditions  stated 
in  Chapter  2.  A  single  infinite  search  sequence  is  optimum  over  any  interval  in  P  over  which 
U'”(P)  is  linear.  The  fundamental  recursion  equation  that  applies  in  place  of  Eq.  (2-4)  is 

■  U”(P;I)  =  P((Pi  -  q^X^]  +  (1  -  P) 

+  [Pr,  +  1-P)U” 

U*”  (P)  =  min 

U“’{P;  2)  =  Pp^T^  +  (1  -  P)f(p2  -  77^) 

4[P4(l-P)r^]U”  .  (^-4)§ 

The  searcher's  optimum  strategy  requires  that 

if  P  >  Pq  ,  look  into  box  1  ; 

if  P  <  Pq  ,  look  into  box  2  ; 

if  P  =  Pq  ,  look  into  either  box 

Again,  this  property  is  derived  rigorously  in  Appendix  A.  The  point  Pq  can  be  calculated  again 
by  requiring  that  U°°(Pq;  12)  =  u”  (P^;  21); 


implies  that 


U”(P;  12)  =  {P[(p^  -  nj)  Tj  -  q^AiJ  +  (1  -  P)  P^T^} 

+  +  (1  -  P)  [(p^  -  T]^)  -  (12.^2.^  } 

+  [Prj  +  (1  -  P>  r^J  U  [pj.^  +  (1  _  P)  r^]  ' 

=  U”(P;  21)  =  {Pp^T^  +  (1  -  P)  [(p^  -  -^2  -  q2^2J^ 

+  {Pf(Pl  -  V^)  Tj  -  q^^i]  +  (1  -  P)  r2'^2'^l^ 

+  [Pr-i  +  (1  -  P)  r,]  U”  [p,^  ,  f/A  P)  ) 


p _ _ 

°  ^  I  ^2 

Pl^l  P2^2 

It  is  interesting  to  note  that  Pq  is  independent  of  Pj  and  Xj.  Since  p^  and  Tj  must  be  positive  but 
finite,  Pq  will  always  lie  in  the  interior  of  the  interval  (0,  1). 

The  transformation  of  the  state  variable  P  is  a  function  of  r^  and  r^  (or  and  q^)  only; 
therefore  the  recurrent  region  (Pq^,  Po2^  defined  as  in  Chapter  2.  Once  P  enters  this 

region  it  must  remain  in  it  as  long  as  the  searcher  uses  an  optimum  strategy.  It  is  possible  to 
calculate  U°°(P)  outside  of  this  region  once  u“(P)  is  Icnown  within  it.  The  payoff  inside  the  re¬ 
current  region  can  be  calculated  in  the  same  manner  as  before  because  the  optimum  chain  dia¬ 
gram  remains  the  same.  Only  the  position  of  Pq  and,  therefore,  the  other  breakpoints  are  func¬ 
tions  of  p.  and  T.. 

^  1  „ 

When  a  chain  diagram  is  used  to  generate  the  search  sequence,  a  linear  payoff  Uj  (P)  = 

ajP  +  bj(l  —  P)  can  again  be  associated  with  each  state  Sj  in  the  chain.  If  the  chain  is  optimum, 
U”(P)  will  be  equal  to  the  optimum  payoff  over  the  associated  interval 

The  equations  which  relate  the  payoff  associated  with  one  state  to  that  of  another  are 


Uj  =  (P,  -  rj.)  T.  -  q.X.  +  r.a 


b.  =  P,T,  +  b. 

1  2  1  j 


s. - -  s. 

1  J 


a.  =  p.  T,  +  a. 
I  1  2  J 


b.  =  (p^-V2)  +  r^b.  . 


(2-ll)§ 
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When  Sj  is  transformed  into  Sj  by  a  sequence  defined  by  {tjjj(n)}  which  involves  a  total  of  looks 
into  box  1  and  k^  looks  into  box  2,  we  have 


=  E  {(p^  -  T)^)  nr^  +  p^[t^(n) -n] 

n=l 

k, 

+  [(P^  -V^}  +  a.]  , 

^2 

^1  =  ^2  E  r" {(p^  -  nr^  +  P2[t2(n)  -  n]  -X 
n=l 


*^2 

+  r^  +  (^2  ~  ’’2^  '^2’’2  '  (2-13)§ 

By  letting  {t  (n)}  represent  an  infinite  search  sequence,  it  becomes  clear  that  the  payoff  asso- 
™  nj  n, 

dated  with  such  an  infinite  sequence  must  be  linear  in  P.  Finally,  if  r^  ^  =  r2  the  optimum 

search  sequence  will  be  periodic  inside  the  current  region  and  we  find  that 


Si  - .s.. 


=  — Sr  (‘’i  E  ^ ^2“ 

l-r/  \  0=1 


+  Tj  [(Pj  -  'll)  +  Pja2T2] 


bi  = 


m  (^2  E 


1  -  r. 


\  j  = 


{(P2  “  ^2>  ^"^2  +  P2[^2'j'  -  j)  -^1  -  ^2^ 


+  r 


”2  \ 

2  fP^n^T^  +  <P2-’J2)  n2’'2l  }  ’ 


The  only  important  differences  which  arise  with  the  introduction  of  the  generalized  reward 
Structure  are  that  U°°  (P)  may  be  negative  for  some  P  and  that  it  need  not  achieve  its  maximum 
inside  the  recurrent  region.  The  former  situation  can  occur  if  one  or  both  of  the  detection  losses 
is  very  large.  This  has  no  other  effect  on  the  solution,  although  it  may  deter  the  evader  from 
playing  the  game.  Furthermore,  no  difficulty  should  be  encountered  if  U°°{P)  is  a  maximum  out¬ 
side  the  recurrent  region.  The  payoff  inside  the  recurrent  region  can  be  calculated  in  exactly 
the  same  manner  as  before,  and  once  this  has  been  done,  it  can  be  calculated  as  far  into  a  tran¬ 
sient  region  as  is  necessary. 
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To  illustrate  the  manner  in  which  the  solution  can  be  extended  into  the  transient  regions,  let 

4  3 

us  consider  our  familiar  example  where  r^  ~  ^z"  interval  (0,  1)  can  be  partitioned  into  the 

intervals  over  which  U  (P)  is  linear  as  before: 


The  same  chain  diagram  also  applies  for  intervals  inside  the  recurrent  region.  Transient  states 
can  be  added  by  noting  that  a  look  into  box  1  shifts  an  interval  n^  places  to  the  left  and  a  look  into 
box  2  shifts  an  interval  n^  places  to  the  right.  The  chain  diagram  that  includes  some  of  the  tran¬ 
sient  states  is  shown  in  Fig.  22.  Once  the  payoff  associated  with  each  state  in  the  recurrent  chain 


is  known,  those  associated  with  each  transient  state  can  be  calculated  by  using  Eq.  (2-H)§  or 
Eq.  (2-13)§  .  The  values  of  the  separating  breakpoints  can  be  calculated  in  exactly  the  same  way 
as  before.  Naturally,  the  solution  should  be  extended  only  in  the  direction  in  which  U”°(P)  in¬ 
creases  and  only  as  far  as  is  necessary. 

The  point  P*  is  again  the  evader's  good  strategy  in  g”.  Once  the  payoffs  associated  with 
the  two  states  that  are  optimum  at  P  =  P*  have  been  found,  the  searcher's  good  strategy  can  be 
calculated.  Equation  (2-15)  can  be  used  to  make  this  computation  without  alteration. 


6.3  GAME  G°:  43  =  0 

When  both  moving  costs  are  equal  to  zero,  the  game  can  be  solved  as  it  was  in  Chapter  3. 
The  evader  should  restore  the  state  variable  P  to  its  optimum  value  Pp  after  each  unsuccessful 
look,  and  the  searcher  should  make  each  look  according  to  the  probability  distribution 

Xo  = 

If  the  evader  always  restores  the  state  variable  to  P  before  each  next  look  and  the  searcher 
looks  into  box  1, 

U°(P:  1)  =  P[(pj  -71^)  Tj  +  (1  -  P)  P^T^  +(Pr^  +  1  -  P)  U°(P)  . 
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If  a  look  into  box  1  is  optimum  for  a  given  P,  it  is  always  optimum  for  that  P;  therefore, 

U°(P;1)  =  ^  {P[(Pi-»Ji)Ti-qiXlI+  (1-P)  p^T^}  . 

Similarly,  if  the  evader  always  returns  the  state  variable  to  P  before  each  look  and  the  searcher 
always  looks  into  box  E,  the  payoff  is 

U°(P:2)  =  (i  _  p)  {PP1T2  +  (1  -  P)  [(P2  -  V^)  -  q^X^]}  . 

The  optimum  value  P^  of  the  state  variable  is  that  which  maximizes  the  minimum  of  U°(P;  1)  and 
U°(P;  E).  Since  both  of  these  functions  are  nonlinear,  it  must  be  shown  that  there  indeed  exists 
a  Pq  where  0  <  P^  <  1  for  which  max  (minU  (P;  i))  =  U°(Pq;  i)  =  U°(Pq;  E).  The  demonstration  is 

carried  out  in  Appendix  D.  It  follows  that  the  evader's  good  strategy  and  the  resulting  guaranteed 
payoff  can  be  found  by  solving  the  equations 

=  Pq^  +  (1  -  Pq)  Pe^} 

(1  -  Pq)  q^  ^^oPi’'e  “  ^0*  f*PE  “  ^E*  '^E  “  ‘5e’''e1^ 

The  nonlinearity  of  the  functions  U°(P;1)  and  U°(P;  2)  might  appear  surprising  at  first  thought, 
because  similar  functions  such  as  U'(P;  i)  in  Chapter  4  have  usually  been  linear  or  piecewise 
linear.  Note,  however,  that  in  that  chapter  U'(P;  i)  was  defined  as  the  payoff  in  F'  which  resulted 
if  the  searcher  looked  first  into  box  i  and  both  players  used  optimum  strategies  thereafter.  The 
function  U’(P;  i),  on  the  other  hand,  has  been  defined  here  as  the  payoff  that  results  if  the  searcher 
always  looks  into  box  i  and  the  evader  always  returns  the  state  variable  to  P.  Thus,  the  evader's 
entire  future  strategy  is  a  function  of  the  variable  P.  When  P  is  unequal  to  Pq,  the  evader's  en¬ 
tire  future  strategy  is  not  optimum.  This  point  was  not  mentioned  in  Chapter  E  since  the  general 
game  with  p  ^  0  had  not  been  considered  at  that  time. 

Let  us  return  to  G”,  where  the  searcher's  good  strategy  can  be  found  in  the  same  manner  as 
was  the  evader's.  If  the  searcher  uses  the  probability  distribution  Y  =  (Y,  i  —  Y)  and  the  evader 
hides  in  box  1 ,  the  payoff  is 

W°(Y;1)  =  YfCp^-n^)  -q^Xj]  +  (i  -  Y)  +  [Yr^  +  (1  -Y)]  W>(Y)  . 

If,  on  the  other  hand,  the  evader  hides  in  box  2,  we  find  that 

W°(Y:  2)  =  Yp^T^  +  (1  -  Y)  [(p^  -  p^)  -  q2X2]  +  (Y  +  (1  -  Y)  r^]  W(Y)  . 

The  searcher's  good  strategy  can  be  found  by  solving  the  equation 

^  (Yo[(Pi-^i)  r^-q^X^J  +  d-Yp)  p^r^} 

^  (1  -  Yq)  q^  ^^0^2'^!  Yq)[(P2  -  ^2*  ’’’z  ~ 

because  the  above  equations  yield  a  Yq  for  which  0  <  Y^  <  i  and  W(Yq)  =  U(Pq)  =  V°,  the  value. 
These  properties  are  also  shown  in  Appendix  D. 
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Since  the  equations  used  to  determine  P^,  Yq  and  V°  are  nonlinear,  it  is  not  convenient  to 
express  the  solution  algebraically.  Nonetheless,  the  solutions  are  rather  easy  to  obtain  nu¬ 
merically. 

In  several  special  cases  algebraic  solutions  can  be  found  readily.  For  example,  if  ’Ij  = 


^0  T^/q^  +  r^/q^ 


Pi/\ 


^0  P±/^i  +  Pz/^z 


V  = 


When  Xj  =  0  and  =  Pj.  [when  (pj  —  t.  =  0],  then 


V°  = 


In  this  last  example,  the  searcher  concentrates  more  attention  on  a  box  if  the  earning  rate  is 
large  and  the  look  time  is  small.  The  evader  does  the  reverse.  Both  players,  however,  concen¬ 
trate  more  attention  on  a  box  if  its  detection  probability  is  small. 

6.4  GAME  G 

When  the  evader  can  move  between  looks  at  a  cost,  the  search  evasion  game  can  be  solved 
in  essentially  the  same  manner  as  in  Chapters  4  and  5.  The  efficient  move  condition  given  in 
Eq.  (4-1)  still  holds,  but  now  the  cost  of  the  transformation  of  the  state  variable  depends  on 
whether  it  is  increased  or  decreased;  that  is. 


C(P  -  P') 


Pj(P'-P)  ,  P'>P 
H.2(P-P')  .  P'-^P 


(4-2)§ 


As  was  mentioned,  is  the  cost  associated  with  a  move  to  box  1  from  box  2  and  applies  for 
a  move  in  the  reverse  direction. 

The  modified  games  F  and  F'  can  be  used  as  before,  but  the  functional  equations  are  now 

U'(P:  1)  =  P[(P^  -  T?!)  -  q^Xi  ]  +  (1  -  P)  p^T^ 

Pr. 


and 


U'(P)  =  min 


Pr,  +  1  -  P 

LI  J 


+  [Pr^  +  1  -  P]  U 
U'(P;  2)  =  +  (1  -  P)  [(P2  -  h2)  -  q^X^] 

IfP  +  d-P)  r^JU 


U(P)  =  max 
P' 


-p.(P'  -  P)  +  U'(P') 


-p^iP  -  P')  +  U'(P') 


P'  P 


P'  $  P 


(4-3)§ 


(4-4)§ 
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Both  U(P)  and  U'(P)  must  still  be  continuous  and  convex  (see  Appendix  B).  As  in  Fig.  5,  U(P) 
and  U'(P)  are  identical  over  the  no-move  region  (P  ,  P^),  but  now  this  region  is  defined  by 


dU'(P) 

dP 


dU'(P) 

dP 


P  <  P_ 

■ 

P  >  P_  ; 

<>^2  ■ 

P  <  P^ 

>P2  , 

P  »  P^  . 

(4-5)§ 

The  function  U(P)  has  a  slope  equal  to  +(i.^  in  the  moving  region  (0,  P  )  and  a  slope  of  — M'2 
moving  region  (P^,  i). 

The  fundamental  property  of  the  searcher's  optimum  strategy  in  F' ,  which  was  discussed 
in  Sec.  4,3,  remains  the  same  (see  Appendix  B).  There  exists  a  Pp,  where  P  P^,  such 

that, 

P  <  Pq  U'(P;  2)  <  U'(P:  1)  =>  look  into  box  2 


P  >  P„ 


U'(P;  1)  <  U'(P;  2) 


look  into  box  1 


The  moving  costs  are  again  prohibitive  and  the  game  can  be  solved  in  terms  of  G  if  the 
no-move  region  contains  the  recurrent  region.  This  occurs  if 


>  du”(P) 


p=p 


01 


^"2 


du”(P) 

dP 


P=P 

02" 


'(4-7)§ 


When  this  condition  holds,  U(P)  will  be  identical  to  U^fP)  over  the  no-move  and  hence  the  re¬ 
current  region.  Although  U(P)  may  be  a  maximum  outside  the  recurrent  region,  clearly  it  must 
achieve  this  maximum  value  inside  the  no-move  region  where  it  is  identical  to  U'”(P).  Thus,  the 
value  and  good  strategies  can  be  obtained  from  the  function  U^iP)  as  before.  It  should  be  noted 
that  a  simple  prohibitive  bound  cannot  be  placed  on  either  moving  cost.  The  moving  costs  can 
be  considered  prohibitive  in  the  previous  sense  only  if  they  both  satisfy  (4-7)§  . 

IVhen  the  moving  costs  are  not  prohibitive,  the  correct  chain  diagram  must  be  found  before 
the  payoff  function  U(P)  can  be  calculated.  This  can  again  be  accomplished  by  studying  the  man¬ 
ner  in  which  the  form  of  the  payoff  functions  changes  from  strategy  interval  to  strategy  interval 
as  and  increase  up  to  their  appropriate  values.  In  order  to  do  this,  it  is  best  to  hold 
and  p^  in  a  fixed  ratio  as  they  are  increased. 

Although  the  form  of  the  chain  diagram  associated  with  the  optimum  search  strategy  is  in¬ 
dependent  of  the  reward  coefficients  when  the  moving  costs  are  prohibitive,  this  is  not  true  when 
the  moving  costs  are  not  prohibitive.  In  Chapter  4,  we  saw  that  two  possible  changes  could  oc¬ 
cur  in  the  form  of  the  payoff  function  U(P),  and  hence  in  the  form  of  the  chain  diagram,  at  the 
end  of  each  strategy  interval.  The  change  that  now  occurs  depends  on  the  reward  coefficients 
and  the  ratio  p^/p^  that  are  used.  Thus,  the  sequence  of  chain  diagrams  associated  with  a 
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particular  pair  of  detection  probabilities  and  the  simple  reward  structure  need  not  be  the  same 
as  that  which  occurs  when  an  arbitrary  set  of  reward  coefficients  is  used. 

When  both  moving  regions  extend  into  the  recurrent  region  and  the  appropriate  chain  diagram 
is  known,  the  payoff  functions  can  be  calculated  in  much  the  same  way  as  in  Sec.  4.7.  The  chain 
diagram  must  include  the  moving  states  s  and  s^  and  each  interval  in  the  no-move  region  must 
have  an  associated  state  in  the  chain.  As  a  result,  the  two  search  states  optimum  at  P*  must  be 
included  in  the  chain  and  no  additional  transient  states  are  required.  Once  the  payoff  coefficients 
have  been  introduced,  the  same  set  of  equations  may  be  used  to  obtain  a  solution.  The  function 
U'(P)  may  be  expressed  in  terms  of  U^(P)  and  U^(P)  may  be  expressed  in  terms  of  U  (P)  by 
means  of  Eq.  (4-8)§  .  Equation (4-8)§  is  identical  to  Eq.  (2-1 3)§  in  Sec.  6.2,  just  as  Eq.  (4-8)  is 
identical  to  Eq.  (2-13).  Equation  (4-9)  must  be  modified  to 

a_  -  b_  = 

b^  -  a^  =  •  (4-9)§ 

The  functions  U  (P)  and  U'  (P)  must  again  intersect  at  P  ,  and  U^(P)  and  U^(P)  must  intersect 
at  P^.  Therefore,  as  previously, 

aP  +b(l-P)  =  a'P  +b'(l-P)  , 


a+P+  +  b^d  -  P^)  =  a;^P^  +  b;(l  -  P^)  .  (4-10) 

The  sequences  of  looks  which  transform  P  into  P^  and  P^  into  Pq  can  also  be  found  from  the 
chain  diagram,  and  Eq.  (4-11)  remains  the  same  as  before: 


P.r, 


+  (1 


Pj)  r, 


(4-11) 


Finally,  Eq.  (4-12)  must  be  modified  and  we  find  that 


Pq  [<Pi  -  Pq*  (P2'^1  "  ^0*^1 '^2 

+  (1  -  Pq)  [(P2-n2)  ’•2-q2’'2  +  '’2'"+!  •  (4-12)§ 


Once  this  set  of  equations  has  been  solved,  the  payoffs  associated  with  the  other  states  in  the 
chain  can  be  calculated  by  means  of  Eq.  (4-8)§  and  the  remaining  breakpoints  can  be  found  by 
using  Eq.  (4-11). 

If  only  P  or  P^  belongs  to  the  interior  of  the  recurrent  region,  the  payoff  can  be  calculated 
inside  the  recurrent  region  in  the  same  manner  as  in  Chapter  4  once  the  appropriate  equations 
have  been  modified  as  above.  Although  U(P)  must  always  be  a  maximum  at  a  point  that  lies  in 
the  no-move  region,  this  point  need  not  lie  in  the  recurrent  region.  Transient  states  associated 
with  intervals  that  lie  in  the  no-move  but  not  the  recurrent  region  can  be  attached  to  the  recur¬ 
rent  chain  in  exactly  the  manner  discussed  in  Sec.  6.2.  Once  this  has  been  done,  the  payoffs 
associated  with  each  of  these  states  can  be  calculated  by  using  Eq.  (4-8)§  ,  since  no  moving  occurs 
during  the  look  sequence  that  transforms  such  a  state  into  the  recurrent  chain. 

It  is  perhaps  worth  noting  here  that,  with  the  simple  reward  structure,  intervals  may  also 
exist  which  lie  in  the  no-move  but  not  in  the  recurrent  region.  For  example,  in  Fig.  9(e),  the 
bounding  point  P_  may  shift  to  the  left  of  Pqj  =  P_2  before  P^  shifts  from  P^  to  P^.  This  situation 


71 


would  have  no  effect  on  the  recurrent  chain  in  Fig.  10(e).  Therefore,  it  would  have  no  effect  on 
the  manner  in  which  the  payoffs  associated  with  the  states  in  this  chain  are  calculated.  If  such 
a  shift  were  to  occur,  a  linear  interval  ir_^  would  result.  This  interval  would  lie  in  the  no-move 
but  not  the  recurrent  region.  The  associated  state  s_^  would  be  a  transient  state  and  would  be 
transformed  into  s^  by  an  optimum  look  into  box  2. 

The  possibility  of  such  behavior  was  not  mentioned  in  Chapter  4  because  it  was  tacitly  as¬ 
sumed  that  P*  would  always  lie  inside  the  recurrent  as  well  as  the  no-move  region.  Although 
this  assumption  has  not  been  proved,  it  is  the  author's  opinion  that  it  is  indeed  valid.  However, 
if  this  faith  were  contradicted  by  some  special  example,  the  result  would  not  be  catastrophic, 
for  the  payoff  associated  with  a  state  such  as  s__j  in  the  above  example  could  be  calculated  easily. 
Once  this  had  been  done,  the  evader's  good  strategy  could  be  calculated  as  before.  The  searcher's 
good  strategy  also  could  be  found  in  the  usual  manner  once  a  transient  state  was  attached  to 
(T^  by  a  look  into  box  2  (see  Fig.  18).  Note  that  in  such  a  situation  there  would  be  no  correspond¬ 
ing  recurrent  state  since  only  states  having  associated  intervals  in  the  recurrent  region 

belong  to  the  recurrent  chain. 

In  the  search  evasion  game  with  the  generalized  reward  structure,  the  searcher's  good 
strategy  can  be  derived  easily  once  games  F  and  F'  have  been  solved.  The  searcher's  good 
strategy  must  again  be  Markovian  and  the  recurrent  chain  of  the  transition  diagram  is  identical 
to  the  recurrent  chain  of  the  chain  diagram  of  games  F  and  F'  after  the  move  transitions  have 
been  deleted.  A  transient  state  cr}  is  associated  with  each  interval  in  the  no-move  region  and 
transforms  into  rr  or  <r.  in  exactly  the  same  manner  as  !r .  transforms  into  ir  or  ir  , .  If  P  and 
P^  do  not  both  belong  to  the  interior  of  the  recurrent  region,  there  may  exist  intervals  that  lie 
in  the  no-move  but  not  the  recurrent  region.  As  was  just  mentioned,  such  an  interval  will  have 

t  y» 

an  associated  transient  state  <t.  but  not  a  recurrent  state  in  the  transition  diagram. 

The  searcher  can  limit  the  evader  to  U(P)  when  the  initial  P  is  known  as  long  as  U.(P)  = 

W.(P)  for  each  of  the  moving  states  that  belongs  to  the  recurrent  chain  of  the  transition  diagram. 

The  fundamental  functional  equations  of  games  H  and  H'  necessary  now  are 


W!(P)  =  y.(l) 


C  P[(Pi  -  ]  +  (1  -  P) 

+  fPr,  +  l-P]Wj|,  7l-p] 


+yi(2) 


PPlT2  +  (1  -  P)  [(P2  -  7]^)  T2  -  ^2^^] 
+  [P+(1-P)  r2]W.|2 


(5-l)§ 


and 


W.(P)  = 


+  W!(0) 


dW!(P) 

dP 


WJ(P) 


dW'(P) 

<  -d^  ^^^1 


I  -  P)  +  W!(l) 


dW!(P) 


dP 


(5-2)§ 
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+  +  (P2  -  +  '^-1 1 

where  represents  the  sequence  that  transforms  a  into  o'^^- 

If  only  one  of  the  bounding  points  of  the  no-move  region  belongs  to  the  interior  of  the  recur¬ 
rent  region,  only  one  moving  state  and  one  mixed  state  will  occur  in  the  recurrent  chain  of  the 
transition  diagram.  The  good  probability  distribution  associated  with  the  mixed  state  can  be 
calculated  as  in  Chapter  5  once  the  appropriate  equations  have  been  modified  as  they  were  above. 
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1. 


I  When  the  good  probability  distributions  associated  with  the  mixed  states  have  been  found, 

the  starting  rule  needed  in  G  where  the  initial  P  is  unknown  can  be  calculated  exactly  as  be¬ 
fore.  The  starting  rule  merely  provides  a  probability  distribution  for  selecting  the  state  in 
which  the  Markov  process  starts.  Since  a  look  is  not  associated  with  this  selection,  the  equation 
used  to  calculate  Yq  is  identical  to  that  used  in  Chapter  5. 

I 

I 

I 


I 

I 

[ 
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CHAPTER  7 
DISCOUNTING 


7.1  INTRODUCTION 

In  the  last  chapter,  the  reward  structure  of  the  search  evasion  game  was  generalized. 
Nevertheless,  the  contribution  to  the  payoff  of  any  particular  event,  i.e.,  the  utility  of  the  event, 
was  still  considered  independent  of  when  the  event  occurred.  In  many  cases,  however,  this  is 
not  appropriate.  For  example,  if  a  reward  of  one  dollar  is  associated  with  a  given  event,  the 
utility  of  the  event  should  be  greater  if  the  event  occurs  immediately  rather  than  in  the  future. 

A  dollar  in  hand  can  be  invested  and  earn  interest. 

In  this  chapter,  we  shall  consider  the  behavior  of  the  search  evasion  game  when  future  re¬ 
wards  must  be  discounted.  The  term  discounting  is  used  when  the  utility  of  an  event  can  be  found 
by  multiplying  the  associated  reward  (the  utility  that  applies  when  the  event  occurs)  by  a  discount 
factor.  This  discount  factor,  which  is  applied  to  all  rewards,  must  be  a  function  of  only  the  dif¬ 
ference  between  the  time  at  which  the  utility  is  evaluated  and  the  time  when  the  event  occurs. 

Thus,  it  must  have  the  property  of  stationarity.  A  further  restriction  which  will  be  imposed  in 
this  chapter  is  that  the  discount  factor  must  decay  exponentially  with  time. 

Such  a  discount  factor  is  clearly  appropriate  when  the  various  rewards  are  made  in  monetary 
units.  In  the  example  of  revenuer  vs  moonshiner,  this  is  the  case.  If  the  moonshiner  is  able  to 
invest  his  profits  so  that  they  earn  compound  interest  at  a  rate  a  per  unit  time,  one  dollar  in- 

Q!  t 

vested  at  t  =  0  increases  in  value  according  to  the  function  e  .  By  reversing  this  reasoning,  we 
find  that  a  reward  of  one  unit  received  at  time  t  should  have  a  utility  at  t  =  0  of  e”°^*.  Thus,  e”*^^ 
is  the  discount  factor.  If  interest  is  compounded  only  at  discrete  time  intervals,  as  for  example 
in  a  saving"^  bank,  the  discount  factor  does  not  decay  continuously  but  at  discrete  intervals.  As 
long  as  the  interest  per  period  is  of  the  order  of  a  few  percent  or  less,  the  approximation  of  con¬ 
tinuous  compounding  is  very  good. 

When  rewards  are  not  made  in  monetary  units,  exponential  discounting  is  often  still  appro¬ 
priate  and  in  many  other  situations  it  serves  as  a  useful  approximation.  One  must,  of  course,  be 
careful  that  the  utility  of  a  reward  decays  in  a  manner  which  depends  only  on  the  total  decay  time 
and  not  on  the  time  when  the  decay  begins. 

As  an  example  of  a  reward  that  does  not  satisfy  this  requirement  of  stationarity,  consider 
the  utility  of  information  concerning  the  fixing  of  a  horse  race.  Such  information  clearly  has  a 
high  utility  to  a  prospective  wagerer  if  it  is  received  before  the  race  is  run.  Once  the  race  is 
over,  however,  it  has  no  value  at  all  (except,  of  course,  to  a  race  official) .  Thus,  the  utility  of 
such  information  does  not  depend  upon  how  far  in  the  future  it  is  received  but  on  when  it  is  re¬ 
ceived  relative  to  the  time  of  the  race. 

As  we  can  see,  the  restriction  of  stationarity  implied  by  the  term  discounting  is  a  very  strong 
one.  On  the  other  hand,  the  further  requirement  that  the  discount  factor  decay  exponentially  with 
time  does  not  restrict  its  applicability  appreciably  more.  A  little  thought  will  show  that  in  most 
cases  if  the  utility  of  a  reward  decays  in  a  nonexponential  manner,  the  requirement  of  station¬ 
arity  itself  is  actually  violated. 

As  an  example  of  a  situation  in  which  exponential  discounting  of  rewards  may  be  appropriate 
when  the  rewards  do  not  involve  money,  consider  a  search  evasion  game  in  which  the  evader  is 
a  clandestine  manufacturer  of  ballistic  missiles  violating  an  arms  control  agreement  and  in  which 
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the  searcher  is  a  member  of  an  inspectorate  set  up  to  police  this  agreement.  Here,  the  evader 
manufactures  weapons  rather  than  moonshine.  The  utility  of  a  given  stockpile  of  missiles  must 
be  defined  in  terms  of  the  political  power  (sudden  ultimatum,  etc.)  which  such  a  stockpile  gives 
to  the  state  in  question,  and  not  in  terms  of  money.  In  this  situation,  the  rate  at  which  these 
weapons  are  amassed  may  be  very  important  to  the  evader.  He  may  have  to  make  political  con¬ 
cessions  that  are  distasteful  to  him  until  he  has  a  sufficient  stockpile.  Also,  in  time,  an  effective 
antimissile  missile  may  be  developed  by  his  opponent,  making  his  missiles  obsolete.  Exponential 
discounting  may  be  useful  as  a  device  for  approximating  the  evader's  interest  in  quick  returns. 

One  must  bear  in  mind  that  when  any  realistic  situation  is  modeled,  many  assumptions  and 
approximations  are  usually  necessary  before  the  problem  can  be  simplified  to  the  point  where  it 
can  be  handled  analytically.  When  rewards  do  not  involve  money,  the  problem  of  establishing  a 
set  of  utilities  for  the  various  possible  events  usually  poses  far  more  difficulties  than  does  the 
problem  of  finding  an  appropriate  discount  factor.  In  the  above  example,  the  assumption  that  the 
utility  of  a  stockpile  of  missiles  is  proportional  to  the  number  of  missiles  (an  automatic  result  of 
the  model)  is  far  more  open  to  criticism  than  is  the  discounting  device. 

In  this  chapter  we  shall  use  the  reward  coefficients  t;^,  t.  and  that  were  defined  in 
Sec.  6.1.  In  addition,  let  us  define 

a  =  interest  rate, 

d  =  e”"^  =  discount  factor  per  unit  time. 


1  -d 


=  effective  search  time  for  box  i. 


'  l  O' 

When  the  evader  hides  in  box  i  and  the  searcher  looks  into  box  j  (j  ^  i),  the  evader  receives 
income  at  a  rate  p.  for  units  of  time.  The  reward  of  this  event,  which  is  equal  to  the  utility 
that  applies  when  the  event  begins,  is 

-O'  T.> 


=  p.y. 

If  the  evader  is  hiding  in  box  j,  the  reward  is  (p^  —  ri^)  This  is  equivalent  to  the  o!  =  0  case 

where  the  search  time  for  box  j  is  y..  As  a  result,  y.  is  called  the  effective  search  time.  Both  t., 

3  3  3 

the  actual  search  time,  and  y^  will  be  used  in  our  equations. 

A  final  detection  loss  (or  negative  reward)  can  be  incurred  by  the  evader  when  he  is  found. 
This  loss  may  be  used  to  account  for  a  penalty  that  the  evader  incurs  as  a  result  of  being  found 
and  also  for  the  loss  of  earnings,  in  the  expected  sense,  that  can  result  if  he  can  be  found  some¬ 
time  during  the  look  rather  than  just  at  the  end.  The  utility  of  this  loss  when  it  is  evaluated  at 
the  end  of  the  final  look  will  be  represented  by  X.. 

Now  that  time  is  an  important  consideration,  we  must  also  consider  the  time  during  which 
moving  can  occur.  Usually,  we  can  expect  that  a  dead  time  between  looks  provides  the  opportu¬ 
nity  for  this  action.  It  is  convenient  to  let  the  search  time  t.  include  the  dead  time  at  the  end  of 
the  look.  This  may  require  some  adjustments  in  the  various  earning  rates,  and  so  forth,  but  usu¬ 
ally  there  is  no  reason  for  assuming  that  the  evader  cannot  continue  earning  during  the  dead  time. 
In  fact,  the  moving  cost  may  result  partially  from  the  loss  in  earnings  which  occurs  during  the 
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time  required  for  the  move.  The  value  of  each  will  be  defined  as  the  loss  in  utility  that  applies 
just  before  the  next  look  and  hence  at  the  end  of  the  time  available  for  moving.  This  definition 
allows  us  to  write  our  equations  as  though  moving  occurred  instantaneously. 

The  introduction  of  discounting  into  the  search  evasion  game  does  not  appreciably  affect  its 
general  behavior  but  does  increase  the  notational  complexity  of  some  of  the  equations.  As  a  re¬ 
sult,  we  shall  merely  paraphrase  the  developments  of  Chapters  2  through  5  as  we  did  in  Chapter  6 
when  the  generalized  reward  structure  was  introduced.  Any  equation  that  must  be  modified  will 
again  bear  its  original  number,  but  this  time  the  double  section  sign  will  be  attached.  Also,  any 
changes  in  the  game's  properties  that  affect  the  previous  results  will  be  discussed. 

7.2  INADMISSIBLE  BOXES 

Perhaps  the  chief  phenomenon  that  is  introduced  by  discounting  and  that  must  be  considered 
before  continuing  concerns  inadmissible  boxes.  In  the  previous  chapters,  we  found  that  the  evader 
should  always  hide  in  either  box  with  a  nonzero  probability  and  that  the  searcher's  good  strategy 
always  requires  at  least  one  look  into  each  of  them.  This  is  true  even  if  one  box  has  a  much  lower 
detection  probability  q  or  a  much  higher  earning  rate  p.  The  evadtr's  good  strategy  requires  P 
to  be  unequal  to  zero  or  one,  since  otherwise  the  searcher,  if  he  knew  this  strategy,  could  always 
look  into  the  correct  box.  Similarly,  the  searcher's  good  strategy  always  results  in  some  looks 
into  each  box,  for  otherwise  the  evader  could  receive  an  infinite  payoff.  When  discounting  applies, 
however,  the  evader  can  never  receive  an  infinite  payoff.  As  a  result,  we  may  find  that  the  detec¬ 
tion  probabilities,  earning  rates,  and  so  forth,  are  biased  so  much  in  the  favor  of  one  box  that  the 
inferior  one  is  not  used  by  either  player.  If  this  occurs,  the  box  is  inadmissible. 

The  conditions  under  which  a  box  is  inadmissible  can  easily  be  found.  To  do  this,  first  con¬ 
sider  the  case  where  the  evader  hides  and  remains  in  box  i  and  the  searcher  always  looks  into 
box  j.  In  this  situation,  the  evader  has  an  earning  rate  p.  that  continues  for  all  time.  Therefore, 
he  receives  a  total  payoff  equal  to  Pj/a  .  If,  on  the  other  hand,  he  were  to  hide  in  box  j  until  he 
were  found,  while  the  searcher  always  looked  there,  he  would  receive  a  payoff  equal  to 


T.  T. 

U  =  (p.  —  Tj.)  y.  —  q.d  +  r.d 
J  J  13 


T. 

(p,  -T7.)  y,  -  q,d  ^x. 
U  =  J  J  J  J _ J 

T. 

1  —  r.d  ^ 

J 


Thus,  if 


a  ''  T.  ' 

1  —  r.d  ^ 

J 

box  i  is  inadmissible,  for  the  evader  would  be  foolish  to  hide  in  box  i  even  if  the  searcher  al¬ 
ways  looked  into  box  j.  Similar  reasoning  shows  that  the  searcher  should  never  look  into  an  in¬ 
admissible  box  unless  he  knows  that  the  evader  is  foolishly  hiding  there  with  a  sufficient  non¬ 
zero  probability.  Since  we  require  each  p,  jj,  t,  and  X  to  be  nonnegative,  both  boxes  cannot  be 
inadmissible. 
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If  one  of  the  boxes  is  inadmissible,  the  two-box  game  loses  all  interest,  for  the  game  degen¬ 
erates  to  a  trivial  one-box  game.  It  should  be  clear  that  the  inadmissibility  condition  does  not 
depend  upon  the  moving  costs  or,  in  fact,  on  whether  moving  is  allowed  or  not.  In  the  remaining 
sections  of  this  chapter,  we  shall  assume  that  neither  box  is  inadmissible. 

7.3  G“:  THE  NO-MOVE  GAME 

When  moving  is  not  allowed,  the  modified  game  F°°  may  again  be  used.  The  payoff  function 
u'”(P)  has  the  same  properties  of  being  continuous,  and  convex.  Perhaps  the  only  difference  in 
its  general  appearance  worthy  of  note  is  that  the  magnitude  of  the  slope  of  this  function  no  longer 
becomes  arbitrarily  large  as  P  approaches  zero  and  one.  This  follows  from  the  fact  that  the 
payoff  is  no  longer  infinite  if  the  evader  hides  in  one  box  and  the  searcher  always  looks  into  the 
other.  The  fundamental  recursion  equation  that  now  applies  is 


U  (P)  =  min 


U 


'(P;l)  =  p|(Pj-»3i)  (1-P) 

td"^Pr,  +  l-P]  U“ 

’(P;  2)  =  Pp^Y,  +  (1  -  P)|(P2  “  >'2  “  *^2  ^^2] 

tdV  +  (l-P)  -J  . 


(2-4)§§ 


There  again  exists  a  Pq  (see  Appendix  A)  where  the  searcher  should  look  into  box  i  if  P  is 
greater  than  Pq,  and  into  box  2  if  it  is  less  than  P^.  The  value  of  Pp  can  be  found  in  the  usual 
way  and  is 

P„  = 


0  =  l//3^  +  i/0^  ' 


where 


T. 

d 

=  y  (Pj  +  a  ^-l  +  Qftj . 


Perhaps  a  simpler  way  of  expressing  this  rule  now  that  the  above  expression  is  so  complex  is 
simply  to  state  that  the  searcher  should  look  into  that  box  for  which  the  associated  expression 


T. 

^  H 

- -  (p.  +  aX.^)  +  arj. 

y.  1  'i 


p./?. 


r. 

is  the  larger.  Note  that  if  a  is  very  small,  =  r.,  d  =  1,  and  is  approximately  equal 
to  (T^/p^q^)/(T^/p^qj  +  T 2/ 92^2)  Chapter  6. 

Every  term  in  the  expression  for  is  positive.  Therefore,  0  <Po  <  1.  This  occurs  even 
if  one  of  the  boxes  is  inadmissible.  This  should  not  be  surprising  since,  even  when  a  box  is  in¬ 
admissible,  the  searcher  should  look  there  if  he  knows  that  the  probability  that  the  evader  is 
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there  is  sufficiently  high.  On  the  other  hand,  U  (P)  will  be  a  maximum  at  P  =  0  if  box  1  is  in¬ 
admissible  and  at  P  =  1  if  box  2  is. 

The  recurrent  region  (Pqj>  ^02^  same  properties  as  in  Chapter  6.  When  =  r^  , 

the  search  sequence  in  this  region  is  again  periodic.  If  logr^/logr^  is  irrational,  it  can  be  ap¬ 
proximated  by  n^/n^  as  before.  With  discounting,  this  approximation  can  be  looser  than  before, 
since  the  effects  of  incorrect  looks  in  the  distant  future  are  ameliorated  by  the  discount  factor 
as  well  as  by  the  decreasing  probability  of  survival.  With  a  given  pair  of  integers  n^  and  n^,  a 
chain  diagram  can  be  devised  in  the  same  manner  as  before.  Transient  states  can  be  added  when 
necessary,  as  was  discussed  in  Chapter  6. 

The  functional  relationship  between  the  payoff  associated  with  a  given  state  in  the  searcher's 
chain  diagram  and  the  one  into  which  it  is  transformed  by  the  next  look  is 


1 

s.  — 
1 


s. 

J 


aj  =  V^-q^d  X^  +  r^d  a.  , 

bi  =  ^  I’j  = 

2 

s - -  s.  -• 

1  3 


.^2 

a.  =  PjT^  +  d  a.  , 

T  T 

bi  =  (P2  -  V2}  V2  -  q2‘^  ^^2  ’’2'^ 


(2-ll)§§ 


If  one  wishes  to  express  the  payoff  associated  with  a  state  s^  in  terms  of  that  associated  with 
a  state  s^  when  a  sequence  of  looks  transforms  Sj  into  s^,  our  notation  must  be  redefined  slightly. 
In  particular,  the  sequence  must  be  defined  by  a  set  {r^^Cn)}  where  Tjjj(n)  is  the  time  at  which  the 
n*'*^  look  into  box  m  is  completed.  We  can  again  let  k^  represent  the  total  number  of  looks  into 
box  m.  Furthermore,  it  is  convenient  to  let  represent  the  total  time  of  the  sequence.  Clearly, 
=  max{T^(k^),  T2(k2)},  and  we  find  that 
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(2-13)§§ 


These  equations  are  rather  complex  and  it  may  prove  simpler  to  compound  Eq.  (2-ll)§§  if  the 
sequence  is  fairly  short.  In  Eq.  (2-13)§§,  the  payoff  associated  with  each  of  the  possible  times 
at  which  detection  can  occur  is  found  by  first  calculating  the  utility  contributed  by  the  earning 
rate  p,  then  subtracting  the  loss  in  earnings  from  each  of  the  looks  into  the  correct  box,  and  fi¬ 
nally  deducting  the  detection  loss.  These  equations  could,  of  course,  be  formulated  in  many  other 
ways.  Perhaps  the  main  reason  for  doing  it  this  way  is  that  it  carries  over  fairly  directly  to  the 
many-box  case.  When  s^  is  transformed  into  itself  by  a  sequence  of  looks,  the  coefficients  a^  and 
b.  can  be  expressed  in  closed  form  by  the  usual  extension  of  (2-1 3) S§. 

Once  the  value  of  P  at  which  U  (P)  is  a  maximum  and  the  payoffs  associated  with  the  search 
states  optimum  at  this  point  have  been  found,  the  searcher's  good  strategy  can  be  completed  in 
the  usual  manner. 


7.4  GAME  G°; 

This  game  may  be  solved  in  exactly  the  same  manner  as  it  was  in  Chapter  6  once  the  effects 
of  discounting  have  been  introduced  into  the  necessary  equations.  Now,  however,  we  must  re¬ 
quire  that  neither  box  be  inadmissible.  The  evader's  good  strategy  can  be  obtained  from  the  solu¬ 
tion  of  the  equations 


U"(Pq)  =  U°(Pq:  1)  = 


1-d  ^1-Ppq^) 


=  U'’(Pg;2)  = 


l_d^2[l-(l-Pp)  q^] 
v/here  0  <Po«i.  The  searcher's  good  strategy  can  be  obtained  from 


W(Yq)  =  W(Yq;  1)  = 


^0  *'’1  “  ■>'1  “  “  ^0*  Pi'’ 


T  T 

^-(1- Yq)  d 


=  W(Yo;2)  = 


YqP^Y^  +  (1-Yo)[(p2-»?2)  •>'2-^2'^  ^^2 
1-y/1-(1-Yo)  r/2 


where  0  Y^^  <  1.  Appendix  D  shows  that  these  solutions  exist  and  that  IfiPp)  =  W°{Yq)  =  V°, 

the  value . 
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Thi-  ab.'Vt  .'4H3tl^nK  wjn  n^t,  course,  yield  a  solution  if  ■->»^e  r^f  +ho  bove«  in  ina-Jmi««ib'p 
When  this  occurs,  the  two  curves  U°(P;  1)  and  U°(P;  2)  still  intersect  at  some  point  P^  in  the  in¬ 
terior  of  the  interval  (0,  1).  Hence,  there  still  exists  a  strategy  for  the  evader  that  yields  a  pay¬ 
off  independent  of  the  searcher's  strategy.  This  is  not  the  evader's  good  strategy,  however,  for 
he  can  guarantee  a  larger  payoff  by  hiding  in  the  admissible  box  with  probability  one;  that  is, 

U“(Pq;  1)  =  U°(Pq;  2)  <  max  {min  U°{P;  i)} .  It  should  not  be  surprising  to  find,  in  contrast,  that 
Pi  — 

the  two  curves  W°(Yg;  1)  and  2)  do  not  intersect  in  (0,  1)  when  a  box  is  inadmissible.  If 

U  9  p.“y^  (t  i  I  o(  Hj*  .'wvl*,r'a 

In  general,  when  each  player  has  a  strategy  that  yields  a  payoff  entirely  independent  of  the  other's, 
the  payoffs  must  be  equal  and  the  strategies  must  be  good  strategies. 

7.5  GAME  G 

When  one  or  both  of  the  moving  costs  are  no  longer  equal  to  zero,  the  techniques  developed 
in  Chapters  4  and  5  may  again  be  used  once  the  appropriate  changes  have  been  introduced  into  the 
various  equations.  The  fundamental  functional  equations  for  the  modified  games  F  and  F'  are 
now 


U'(  P)  =  min 


U'(P;  1)  =  p[:pj  -  n^)  -  q^d  +  (1  -  P)  P^y^ 

,  d'lfPrj  y  1  -  P]  u 

U'(P;  2)  =  Pp^y^  +  (1  -  P)  [(P2  -  Tz  "  '^2'^  ^^2  ] 
y  d'2[P  y  (1  -P)  r^]  U  [p— 1^p7^3 


(4-3)§§ 


and 

-Pj(P' -  P)  y  U'(P')  ,  P' >p 

U(P)  =  max  , 

P' 

-P2<P  -  P')  +  U'(P')  ,  P'<P  .  (4-4)§ 

The  functions  U(P)  and  U'(P)  again  have  the  same  basic  properties  that  allow  the  previous 
solution  techniques  to  be  used.  Both  functions  are  continuous  and  convex.  In  general,  they  will 
be  piecewise  linear  if  the  moving  costs  are  not  prohibitive.  In  game  F'  there  exists  a  P^  where 
0  <  Pq  <  1  that  has  the  usual  properties.  The  proof  that  these  properties  are  still  satisfied  is 
found  in  Appendix  B. 

The  moving  region  (P  ,  P^)  is  again  defined  by  Eq.  (4-5)§  .  The  moving  costs  are  prohibitive 
if  they  both  satisfy  Eq.  (4-7)  §  .  When  this  occurs,  both  U(P)  and  U'(P)  are  again  identical  toU”(P) 
over  the  no-move  region  and  also  over  the  recurrent  region,  which  it  contains  under  these  condi¬ 
tions.  As  we  have  seen,  once  the  moving  costs  become  prohibitive,  the  searcher's  good  strategy 

OO 

becomes  identical  to  that  in  G  .  The  evader  should  never  move  as  long  as  the  searcher  uses  this 
good  strategy.  We  also  found  that  the  no-move  regions  never  completely  disappeared  as  long  as 
and  were  finite  (unless  q^  or  q2  =  !)•  It  was  necessary  to  calculate  the  values  of  P  and  P^ 
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if  we  wished  to  obtain  the  evader's  complete  good  strategy.  These  bounding  points  of  the  no-move 
region  are  of  use  to  the  evader  when  the  searcher  uses  an  inadmissible  sequence  that  transforms 
P  into  a  moving  region. 

With  one  exception,  all  of  these  properties  still  hold  when  discounting  is  considered.  The 
one  exception  is  that  the  moving  regions  can  now  disappear  completely  when  and  ^2  finite. 
This  can  occur  because  the  magnitude  of  the  slope  of  U°°(P)  no  longer  approaches  infinity  as  P 

ao 

approaches  zero  or  one.  In  game  F  ,  the  searcher  should  always  look  into  box  1  if  P  =  1.  There¬ 
fore, 


lim  U  (P)  = 
P-*l 


(Pi  -v^)  -  q^d 


1  —  r^d 


^2 

P  4  —(1  -  P) 


Similarly, 


lim  U  (P)  =  —  P  4 
P-0 


(P2-»)2)  ^2 


1-  r2d 


(1  -P) 


It  follows  that  if 


and 


j,  >  — 

1  a 


(P,  -  Vy)  yy  -  q^d 


1  —  r2d 


'1 

P2  Vj-q^d  X^ 

^2  >  ^ - -  ' 

1  -  r^d  1 

U(P)  and  U'(P)  will  be  identical  to  U^iP)  over  the  entire  interval  (0,  1).  Under  these  conditions, 
the  evader  should  never  move.  Of  course,  as  and  increase  from  zero,  they  will  become 
prohibitive  before  both  of  the  above  conditions  occur.  The  above  bounds,  however,  are  easily 
calculated  and  may  possibly  indicate  that  the  moving  costs  are  definitely  prohibitive  whenp^  and  P2 
are  very  large.  Furthermore,  they  show  that  when  o  is  unequal  to  zero  the  moving  cost  will  be 
prohibitive  for  sufficiently  lat-ge  but  finite  moving  costs  even  when  one, but  not  both, of  the  detection 
probabilities  is  equal  to  one. 

When  the  moving  costs  are  not  prohibitive,  games  F  and  F'  may  be  solved  by  going  through 
the  same  process  of  studying  the  manner  in  which  the  optimum  chain  diagram  changes  from  strat¬ 
egy  interval  to  strategy  interval  as  p^  and  increase  in  constant  ratio.  Also,  the  usual  tech¬ 
niques  may  be  used  to  calculate  U{P)  and  U'(P)  once  the  correct  chain  diagram  has  been  found. 

To  avoid  repetition,  only  those  equations  in  Chapter  6  that  must  be  changed  will  be  listed  here. 


Equation  (4-8)  §  could  be  used  in  that  chapter  to  express  the  payoff  of  a  state  s.  in  terms  of 


transitions.  Thus,  it  could  be  used  to  express  U|(P)  in  terms  of  U  (P),  and  so  forth.  This  equa¬ 
tion  is  identical  to  Eq.  (2-13)§  ,  which  was  used  in  the  no-move  game.  It  must  now  be  replaced 
by  Eq.(2-13)§§. 
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Equation  (4-12)§  must  also  be  rewritten  and  is  now 


o[(Pi-’Ii)  Vi-q^d  +  “  Pq)  h^i  +  V]  =  Pfll' 


Pl^Z 


+  d 


+  (1  -  Pq)  [(P2  -  V2)  yz  -  ^2"^ 


'^2  + 


»,l 


(4-10)§§ 


This  equation  is  used  in  the  above  form  when  belongs  to  the  interior  of  w_  and  to  the  in¬ 
terior  of  TT^.  As  in  Chapter  4,  the  coefficients  a  and  b  or  a^  and  b^  must  be  replaced  by  those 
appropriate  when  only  one  of  the  moving  regions  extends  into  the  recurrent  region  (see  Sec. 4. 7.1) 
Equations  (4-9) §  ,  (4-10)  and  (4-11)  can  be  used  without  any  alterations  since  discounting  has  no 
effect  on  them. 

Once  games  F  and  F'  have  been  solved  and  the  evader's  good  strategy  in  G  has  been  found, 
the  searcher's  good  strategy  can  also  be  obtained  in  the  usual  manner.  The  functional  equations 
of  games  H  and  H'  are  now 


and 


W.(P)  = 


^(1) 

p[(Pl-r,i)  Yj- 

qid 

+  (1  -P)  P2Y1 

T. 

+  d  [Pr^  +  1 

-P]  W,|, 

f  1 

[pr^+  1-pJJ 

yi(2) 

Pp^Y2  +  (1-  P) 

[(P2-92)  ^2-^12^  ^^2) 

•>'2 

+  d  ‘(p  +  (1- 

P)  r2]  W.| 

(  ^ 

2  1p  +  (1  -  P) 

dW,'(P) 

-i.2P 

+  W|(0) 

1 

'  dP 

<-^2 

dW.'(P) 

W^(P) 

,  -P2  < 

dP  ^  '^1 

dW!(P) 

-Ml(^ 

-  P)  +  w;(i) 

'  dP 

>P 

(5-l)§§ 


(5-2)§ 


The  correct  transition  diagram  can  be  derived  from  the  searcher's  chain  diagram  of  gameF', 
once  that  game  has  been  solved,  in  the  usual  manner.  The  only  computational  changes  required 
in  calculating  the  probability  distributions  of  the  mixed  states  are  those  which  result  from  the  new 
form  of  Eq.  (5-l)§§  .  The  reasoning  used  in  Chapter  5  and  Appendix  C  to  show  that  the  searcher 
could  indeed  limit  the  evader  to  U(P)  when  the  initial  P  is  known  is  still  valid  in  the  discounting 
case.  To  complete  the  searcher's  good  strategy,  the  starting  rule  can  be  computed  precisely 
as  in  Chapter  5.  Discounting  has  no  effect  on  this  computation.  No  look,  hence  no  time,  is  in¬ 
volved  in  the  selection  of  a  starting  state  for  the  Markov  process  that  generates  the  search  se¬ 
quence  . 

This  completes  the  discussion  of  the  two-box  search  evasion  game.  As  we  have  seen,  all 
of  the  important  properties  of  this  game  occur  with  the  simple  reward  structure  used  in  Chap¬ 
ters  2  through  5.  Discounting,  of  course,  raises  the  interesting  possibility  that  a  box  may  be 
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inadmissible.  This  occurs  only  in  extremely  biased  cases,  however,  when  the  interest  rate  is 
fairly  high.  The  various  equations  become  more  complex  algebraically  when  the  generalized  re¬ 
ward  structure  and  discounting  are  introduced.  On  the  other  hand,  we  have  seen  that  the  same 
general  computational  methods  are  still  valid,  that  the  number  of  equations  necessary  for  a  given 
calculation  does  not  increase,  and  that  no  new  nonlinearities  (except  in  G°)  arise.  Thus,  the  in¬ 
crease  in  computational  complexity  is  not  great  and  is  a  small  price  to  pay  for  the  increase  in 
generality  achieved  in  these  last  two  chapters. 


CHAPTER  8 

THE  SEARCH  EVASION  GAME  WITH  N  BOXES 


8.1  INTRODUCTION 

In  the  previous  chapters,  the  two-box  search  evasion  game  has  been  considered  in  some 
detail,  and  it  is  now  appropriate  to  turn  our  attention  to  the  more  general  N-box  game.  As  would 
be  expected,  the  behavior  of  the  game  becomes  more  complex  when  three  or  more  boxes  are  in¬ 
volved.  We  shall  first  examine  the  limiting  games  G°°  and  G°.  These  games  behave  much  as 
before  and  only  the  computational  effort  becomes  more  involved.  The  good  search  strategy  as¬ 
sociated  with  G°  will  be  of  particular  interest  since  it  may  always  be  used  to  limit  the  evader  to 
V°,  the  value  of  G°,  when  the  moving  costs  are  unequal  to  zero  or  when  evasive  countermeasures 
other  than  moving  are  available  to  the  evader. 

When  game  G  is  considered,  we  shall  find  that  some  of  the  properties  fundamental  to  the 
solution  techniques  of  the  previous  chapters  no  longer  hold.  For  example,  the  searcher's  good 
strategy  can  no  longer  be  generated  by  a  simple  Markov  process,  and  there  no  longer  exists  a 
finite  number  of  strategy  intervals  as  the  moving  costs  increase  in  constant  ratio  from  zero  up 
to  a  point  where  they  are  all  prohibitive.  As  a  result,  no  general  method  for  solving  G  when 
there  are  more  than  two  boxes  has  been  developed.  A  simple  example  has  been  solved,  however, 
and  will  be  used  to  illustrate  some  of  the  problems  that  can  be  expected  in  the  search  for  exact 
solution  techniques.  It  will  also  indicate  the  extreme  magnitude  of  the  computational  effort  that 
could  be  expected  if  a  general  method  were  devised  and,  therefore,  the  desirability  of  finding  an 
efficient  method  for  obtaining  strategies  that  limit  the  evader  to  a  payoff  close  to  the  value.  A 
particular  approach  will  be  suggested  for  future  research. 

The  reward  structure  that  will  be  used  for  the  N-box  game  is  the  same  as  that  used  in 
Chapter  6.  Thus, 

p.  =  the  evader's  earning  rate  when  he  hides  in  box  i  and  the  searcher  looks 
elsewhere, 

=  the  loss  in  earning  rate  when  the  searcher  looks  into  the  correct  box, 

Tj  =  the  time  required  to  examine  box  i, 

A.J  =  the  detection  loss. 

We  again  require  that  p^,  ^  0  and  >  0.  If  discounting  is  used,  we  can  again  let 

a  =  compound  interest  rate, 

d  =  e  =  discount  factor  per  unit  time, 

T. 

1  —  d  ^ 

■y.  =  -  =  effective  search  time. 

1  o 

8.2  GAME  G” 

In  g”,  the  evader  cannot  move  between  looks  and  the  N-box  case  is  quite  similar  to  the 

two-box  version.  A  strategy  for  the  evader  consists  of  the  selection  of  a  probability  vector 

N 

P  =  {p^.  P^.  •  •  •  .  that  is  defined  over  a  bounded  N  —  1  space  (since  z:  p^^  =  0,  p^  >  0).  A 

i=l 

pure  strategy  for  the  searcher  consists  of  an  infinite  search  sequence  that  is  used  as  long  as 
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necessary.  The  probability  space  over  which  P  is  defined  can  best  be  represented  by  a  regular 
simplex  of  degree  N  —  1  and  barycentric  coordinates.  In  the  three-box  case  the  simplex  is  an 
equilateral  triangle  and  in  the  four-box  case  a  regular  tetrahedron.  For  each  coordinate  p., 
there  is  an  associated  vertex  or  extreme  point  of  the  simplex  where  Pj  =  1  and  an  opposite  face 
over  which  p.  =  0.  This  face  is  the  regular  simplex  of  one  lower  degree  that  is  generated  by  all 
of  the  remaining  vertices.  At  any  given  point  within  the  simplex,  the  value  of  p.  is  equal  to  the 
distance  from  this  point  to  the  i  "  face.  Requiring  the  altitude  of  the  simplex  to  equal  one  in- 


N 

sures  that  S  p.  =  1  for  any  P  belonging  to  it. 
i=l  ^ 

In  the  modified  game  F  ,  the  searcher  is  informed  of  the  initial  position  of  ^  and  can  cal¬ 
culate  its  a  posteriori  position  after  each  unsuccessful  look.  Thus,  if  the  searcher  looks  into 
box  i,  we  find  that 


P 


P' 


P,' 


i  1  —  p.q. 


P^ 


J  ■  1  -  PiQi 


j 


(8-1) 


A  sequence  of  looks  that  involves  a  total  of  kj  looks  into  box  i  for  each  i  transforms  ^  accord¬ 
ing  to 

{kj} 

t»  1  . 


P> 


p! 


N  k. 
S  p  r.  J 
j=l  ^  ^ 


(8-2) 


As  before,  the  order  of  the  sequence  has  no  effect  on  the  final  transformation. 

Given  a  particular  the  searcher  must  decide  where  to  look  next.  Since  P  is  transformed 
by  the  search  process  only,  an  optimum  infinite  search  sequence  can  be  associated  with  each 
As  before,  the  payoff  associated  with  any  arbitrary  sequence  is  linear  in  P.  The  payoff  function 
u'”(P),  which  results  when  an  optimum  sequence  is  used  for  the  P  in  question,  is  formed  by  the 
lower  bound  on  the  ensemble  of  payoffs  generated  by  all  infinite  search  sequences.  The  function 
U'”(P)  must  be  continuous  and  convex.  Furthermore,  it  may  be  piecewise  linear  in  the  interior 
of  the  simplex  over  which  P  is  defined.  That  is,  the  simplex  may  be  partitioned  into  a  set  of 
hypervolumes  within  each  of  which  U°°(P)  is  linear  in  P.  Over  each  of  these  hypervolumes,  a 
particular  infinite  search  sequence  is  optimum.  As  P  approaches  any  boundary,  these  hyper¬ 
volumes  must  become  arbitrarily  small. 

The  functional  equation  that  defines  the  optimum  payoff  function  is  now 

u'”(P)  =  min  {u'”(P;i)}  , 

i 

where 

N  ^ 

U"(P;i)=y.  Y,  P^Pk  -  “  Pi'll)  U°°(P')  ,  (8-3) 

k=l 
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and 


P— i-  P' 

The  optimum  search  strategy  for  F°°  can  be  derived  heuristically  in  much  the  same  manner 
as  before.  It  is  again  convenient  to  let  U°°{P;  ij)  represent  the  payoff  that  results  if  a  look  into 
box  i  is  followed  by  a  look  into  box  j  and  then  by  an  optimum  sequence.  It  follows  from 
Eq.  (8-3)  that 

N 

u”(P;  ij)  =  vj  2  PfcPk  -  Pi  f'i’Ji  +  ‘^i^  '^i) 
k=l 


T  r  N  T  1 

d"'  ^j  2  PkPk-PA>'jPi-Pj(vjPj  +  q/'^j) 

k=l  •* 


where 


T.+T. 

+  d  ^  ^(1 


PA  ~  Pj'^j* 


p  .  p' 

If  we  set  U°°(P;  ij)  equal  to  U°°(P;  ji),  the  term  U°°(P')  cancels,  and  we  find  that 


p.0 .  =  p.0  .  , 


where 


d  ^q. 

^ (p.  +  aXj)  +  Q-J).  . 


When  a  =  0, 


—  q. 

1 


The  equation  P^/lj  =  Pj/3j  defines  a  hyperplane  of  degree  N  —  2  that  intersects  the  line  joining 
vertices  i  and  j  and  also  all  of  the  remaining  vertices.  It  therefore  partitions  the  simplex  into 
two  parts.  In  the  space  where  ^  Pj^j’  sequence  ij  +  optimum  is  preferable  to  ji  +  opti¬ 
mum,  and  so  forth. 

This  does  not  imply  that  either  of  these  sequences  is  necessarily  the  optimum  one.  On  the 
other  hand,  it  is  not  unreasonable  to  assume  that  the  optimum  strategy  will  require  a  look  into 
box  i  before  a  look  into  box  j  when  this  occurs.  Carrying  this  reasoning  a  little  further,  we 
should  expect  the  optimum  search  rule  to  require  the  next  look  to  be  into  that  box  for  which 
is  a  maximum. 

The  above  argument  does  not,  of  course,  prove  that  this  is  indeed  the  optimum  search  rule. 
It  has  provided  a  convenient  means  for  deriving  the  form  of  the  expression  p^,  however,  and  with 
this  expression  it  is  not  too  difficult  to  prove  that  the  above  search  rule  is  indeed  the  optimum 
one.  The  proof  is  contained  in  Appendix  A.  Since  the  optimum  search  rules  of  F°°  developed 
previously  for  the  two-box  game  are  special  cases  of  this  rule,  this  proof  also  establishes  their 
validity. 
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As  a  result  of  the  simplicity  of  the  optimum  search  rule  —  the  searcher  should  merely  look 
into  a  box  for  which  p.;3 .  is  a  maximum  —  many  interesting  properties  can  be  developed.  In  order 
to  do  this,  we  must  first  examine  more  closely  the  behavior  of  the  state  vector  P  as  a  function 
of  a  sequence  of  unsuccessful  looks.  For  any  given  P  =  {p^},  we  may  define  an  associated  set 
{| .}  which  satisfies  the  equations  P^i^  =  92,^^  ~  •  ~  Pn^n  I  j  >  0  for  all  i.  Such  a  set 

is  not  unique  unless  it  is  normalized,  but  any  set  of  this  form  will  uniquely  determine  a  P  be¬ 
longing  to  the  probability  simplex.  An  equation  of  the  form  p.f .  =  p.|.  defines  a  hyperplane  that 
th  th  1  1  J  1 

intersects  all  but  the  i  and  j  vertices  and,  in  addition,  the  line  joining  these  remaining  two. 

Since  P  must  belong  to  all  of  these  hyperplanes,  it  lies  at  their  common  point  of  intersection. 
Although  there  are  N(N  —  1)  such  hyperplanes,  only  N  —  1  are  independent  and  any  arbitrary  set 
will  have  a  unique  point  of  intersection. 

The  most  interesting  property  of  these  hyperplanes  concerns  the  way  in  which  P  is  trans¬ 
formed  from  one  to  another  by  a  sequence  of  unsuccessful  looks.  If  P  is  defined  by  the  set  {l^}, 
the  a  posteriori  P'  resulting  from  an  unsuccessful  look  into  box  k  must  satisfy  the  equations 


Pk^k 

^k 


i  #  k 


p.£.=p.|.  ,  i,  j^k 


If  P  originally  belongs  to  the  hyperplane  Pj£j  =  Pjlj.  it  will  not  leave  it  until  either  box  i  or 
box  j  is  examined.  If  an  arbitrary  sequence  includes  k^  looks  into  box  i  and  k^  looks  into  box  j, 
the  a  posteriori  P'  will  belong  to  the  hyperplane 


This  condition  applies  even  if  the  arbitrary  sequence  includes  looks  into  other  boxes. 

For  a  given  pair  (i,  j),  all  hyperplanes  of  the  form  p.|.  =  p.£.,  or  (£.,  |.)  for  short,  intersect 
th  th  I  1  J  J  1  1 

the  line  joining  the  i  and  j  vertex,  and  they  can  be  ordered  by  their  intersection  along  this 

line.  If  (£!/£!)  <  (£;/£,),  the  hyperplane  (|1,  £!)  intersects  this  line  at  a  point  closer  to  the  i^'^ 

^  J  ^  J  ^3 

vertex  (where  p^  =  1),  and  we  can  say  that  (I lies  on  the  i  side  of  £j).  Similarly,  the 
vector  P  lies  to  the  i*^  side  of  (£j,  £^)  if  (Pj/pj)  <  (lj/£j). 

Any  vector  that  lies  on  the  i^k  side  of  {P^,I3-)  will  remain  there  until  box  i  is  examined,  and 
the  searcher's  optimum  strategy  will  require  at  least  one  look  into  box  i  before  box  j  is  ex¬ 
amined  for  the  first  time.  Therefore,  by  ordering  of  the  terms  P^/lj  in  decreasing  magnitude  for 
a  given  P,  we  can  tell  more  about  the  associated  optimum  sequence  than  merely  which  box  should 
be  examined  first. 

As  an  illustration  of  the  manner  in  which  the  simplex  is  partitioned  into  a  set  of  hypervolumes 
over  each  of  which  a  particular  next  look  is  optimum,  let  us  consider  the  three-box  case.  Here, 
the  simplex  is  an  equilateral  triangle,  and  a  hyperplane  is  a  line  joining  the  k^^  (k  ^  i,  j) 

vertex  to  the  line  connecting  the  i^*^  and  the  j^^.  A  hyperplane  (^j, /3j)  partitions  this  triangle  into 
two  parts.  If  ^  lies  to  one  side,  box  i  will  be  examined  at  least  once  before  box  j  is,  and  so 
forth.  The  simplex  is  of  the  form  shown  in  Fig.  23.  The  part  of  each  hyperplane  (^.,  ^^)  over 
which  the  first  look  can  be  made  into  either  box  i  or  j  is  indicated  by  a  solid  line,  and  the  three 
solid-line  segments  of  this  type  partition  the  triangle  into  the  three  areas  in  which  a  particular 
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next  look  is  optimum.  If  P  belongs  to  the  broken  section  of  the  line  the  next  optimum 

look  will  be  made  into  some  other  box.  i.e.,  the  remaining  one.  As  long  as  0  <  Pj.  Pj  1.  both 
probabilities  will  be  increased  in  constant  ratio  by  such  a  look.  Eventually,  such  a  point  will  be 
transformed  into  the  solid  section  by  an  optimum  sequence  of  looks  into  the  other  box  and  a  look 
into  box  i  or  box  j  will  then  be  optimum. 


Fig.  23.  The  three-box  simplex: 
the  optimum  next  look. 


At  the  point  Pq,  where  the  three  hyperplanes  intersect,  a  look  into  any  box  is  optimum,  and 
each  box  should  be  examined  once  during  the  first  three  looks.  In  the  more  general  N-box  case, 
any  of  the  N!  possible  orderings  of  one  look  into  each  box  is  optimum  during  the  first  N  looks. 

A  recurrent  region  can  be  defined  for  the  N-box  game.  This  region  consists  of  the  minimum 
hypervolume  from  which  no  ^  belonging  to  it  can  be  removed  by  an  optimum  search  sequence 
(of  unsuccessful  looks)  and  into  which  any  other  P  not  belonging  to  a  boundary  of  the  simplex 
must  eventually  be  transformed.  We  shall  first  consider  the  form  of  this  region  when  all  the  de¬ 
tection  probabilities  are  less  than  one.  When  P  belongs  to  the  hyperplane  (| | .)  it  can  be  trans- 
th  .  '  J 


formed  only  to  the  j 
ever 


side  of  it  by  a  look  into  box  i.  For  a  look  into  box  i  to  be  optimum,  how- 

P  must  belong  to  or  lie  to  the  i^^  side  of  the  hyperplane  (ff.,0.).  It  follows  that  no  P  can 

th  /  ^  ^ 

be  transformed  to  the  j  side  of  the  hyperplane  (03  ./r .), /3  .  ]  by  an  optimum  look.  Similarly,  we 

th  ^  ^ 

see  that  no  P  can  be  transformed  to  the  i  side  of  the  hyperplane  [(^.,  (/3./r.)]  by  an  optimum 

th  1  1  J 

look  once  it  lies  on  or  to  the  j  side  of  it.  Therefore,  the  hypervolume 


must  contain  the  recurrent  region.  In  fact,  the  recurrent  region  must  clearly  consist  of  that 
hypervolume  which  satisfies  this  requirement  for  all  pairs  (i,  j).  The  form  of  the  recurrent  re¬ 
gion, for  the  three-box  game  (Fig,  24)  is  an  irregular  hexagon. 

Since  the  recurrent  region  is  convex  and  bounded  by  a  set  of  linear  hyperplanes,  it  can  be 
generated  by  a  set  of  extreme  points.  It  can  be  shown  that  these  extreme  points  are  the  2^  —  2 
possible  points  into  which  can  be  transformed  during  the  first  N  —  1  looks  of  an  optimum  se¬ 
quence. 

"l  *^2 

If  there  exists  a  set  {n^}  such  that  r^  =  r^  =  .  .  .  =  r^^  ,  the  optimum  search  sequence  will 
be  periodic  for  any  P  belonging  to  the  recurrent  region.  A  set  can  always  be  found  that  satisfies 
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the  above  equations  to  any  desired  degree  of  accuracy  as  long  as  each  is  less  than  one.  In 
each  period  of  such  a  sequence,  each  box  i  will  be  examined  m  times.  A  set  of  hyperplanes 
partitions  this  region  into  a  set  of  hypervolumes,  each  having  a  different  sequence  of  this  type 
associated  with  it.  The  form  of  this  partition  can  be  found  once  we  note  that  the  relative  order¬ 
ing  of  looks  into  boxes  i  and  j  within  the  optimum  sequence  is  invariant  over  a  hyperplane  of  the 
form  ?j).  A  set  of  hyperplanes  of  this  form  must,  therefore,  partition  the  recurrent  region 
into  a  set  of  hypervolumes,  over  each  of  which  the  relative  order  of  looks  into  boxes  i  and  j  is 
unique  and  periodic.  Each  hyperplane  is  uniquely  determined  by  the  point  at  which  it  intersects 
the  line  joining  the  i^^  and  vertex,  and  the  members  of  the  set  may  be  ordered  by  these  points 
of  intersection  in  exactly  the  same  manner  as  the  breakpoints  in  the  two-box  game  were.  Each 
such  separating  hyperplane  is  transformed  into  the  hyperplane  (^j, /3j)  by  an  optimum  sequence. 

As  an  example,  let  us  consider  the  three-box  case  where  r^^  =  r^.  The  hypervolume 


1  13 


(which  contains  the  recurrent  region)  is  partitioned  into  n^  +  n^  =  5  hypervolumes  where  three 
(n^)  lie  to  the  first  side  and  two  (n^)  lie  to  the  second  side  of  (/!.,  ^j).  This  partitioning  is  illus¬ 
trated  in  Fig.  25,  where  the  relative  ordering  of  the  looks  into  boxes  1  and  2  is  shown  for  each 
hypervolume.  Along  the  line  connecting  vertex  one  with  vertex  two,  p^  +  p^  =  1,  and  the  game 
behaves  as  though  these  were  the  only  two  boxes  involved.  Thus,  we  can  determine  the  positions 
of  each  of  the  separating  hyperplanes  and  the  ordering  of  looks  into  these  two  boxes  within  each 
hypervolume  by  using  the  techniques  of  Chapter  2.  A  look  into  box  1  transforms  £  two  (n^) 
hypervolumes  in  the  direction  of  vertex  2,  and  so  forth.  If  one  wishes  to  find  the  relative  order¬ 
ing  of  such  looks  outside  the  region  enclosed  by  the  hyperplanes  and  [/3^,  W^/r^)], 

one  can,  of  course,  partition  these  exterior  regions  by  using  the  appropriate  techniques,  which 
are  analogous  to  those  used  in  the  two-box  case. 

In  order  to  complete  the  partitioning  of  the  recurrent  region,  we  need  only  continue  the  above 
process  for  all  pairs  (i,  j).  When  r^  =  =  r^  in  our  three-box  example,  the  general  form  of 

the  resulting  partition  is  that  shown  in  Fig.  26.  In  this  particular  example,  we  find  that  five  dif¬ 
ferent  periodic  chains,  shown  below,  occur  within  the  periodic  region: 


Associated  with  each  of  these  chains  are  Zn.  =  6  hypervolumes,  the  sequence  of  each  hypervolume 
having  a  different  phase.  The  hypervolumes  belonging  to  each  of  these  chains  are  indicated  in  the 
figure,  and  it  is  worth  noting  that  all  of  those  belonging  to  the  same  chain  have  the  same  general 
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configuration.  A  more  important  property  to  note,  however,  is  that  there  must  always  exist 
more  than  one  periodic  chain  within  the  recurrent  region  when  there  are  more  than  two  boxes. 
Thus,  the  state  vector  £  will  not  enter  each  of  the  hypervolumes  of  the  recurrent  region  during 
one  period  but  will  occupy  only  a  subset  of  them.  This  contrasts  rather  strongly  with  the  be¬ 
havior  of  the  two-box  game. 

If  some  of  the  detection  probabilities  are  equal  to  one,  the  recurrent  region  assumes  a  new 
form,  and  the  general  behavior  of  the  optimum  search  sequences  within  it  is  also  somewhat  dif¬ 
ferent.  When  this  occurs,  it  is  convenient  to  separate  the  boxes  into  two  sets,  letting  S'  include 
those  which  have  unity  detection  probabilities  and  letting  S  include  the  others.  Boxes  belonging 
to  S'  can  be  examined  once,  at  most,  and  after  each  of  them  has  been  searched,  P  must  belong 
to  the  subsimplex  generated  by  the  boxes  belonging  to  S.  The  recurrent  region  must  belong  to 
this  subsimplex.  It  is  defined  by  the  relations  p^  =  0  for  all  i  belonging  to  S',  and 


for  all  pairs  (i,  j)  belonging  to  S.  Within  this  region,  the  optimum  search  sequence  involves 

f  "1 

looks  into  boxes  belonging  to  S  and  is  periodic  if  there  exists  a  set  In^^)  such  that  r^^  is  the  same 
for  all  i  belonging  to  S.  This  recurrent  region  is  partitioned  into  a  set  of  hypervolumes  with 
unique  sequences  in  the  same  way  as  before. 

Now  that  the  general  behavior  of  the  optimum  search  strategy  has  been  discussed  at  some 
length,  it  is  appropriate  to  turn  to  the  problem  of  evaluating  the  payoff  function  u”(P).  For  any 
fixed  search  sequence,  the  payoff  is  linear  in  P,  and  if  we  let  represent  a  hypervolume  over 
which  a  given  sequence  is  optimum,  we  can  express  the  associated  payoff  function  in  the  form 

N 

Um<E)  =  S  -m«)  Pj  • 
j  =  l 

Here,  equals  the  payoff  that  results  if  the  evader  is  actually  hiding  in  box  j. 

If  the  infinite  optimum  search  sequence  associated  with  tTj^j  is  defined,  as  in  Chapter  7,  by 
the  set  {Tj(j)},  where  Tj(j)  represents  the  time  at  which  the  look  into  box  i  is  completed, 
each  coefficient  a^(i)  can  be  expressed  in  terms  of  an  infinite  series  as  follows; 


a  (i)  =  q 
m  ^ 


j=l  '  '  k=l 


(8-4) 


When  a  =  0  (no  discounting),  this  equation  reduces  to  the  form 


^m<'>  =  2  ■'i (PiTi(j)  -  j’?!’-.  -  Xj}  .  (8-5) 

i=i 

When  a  finite  sequence  transforms  ir^  into  ir^,  the  payoff  U^(P)  may  be  expressed  as  a  func¬ 
tion  of  U^(P).  Letting  {t^O)}  represent  this  finite  sequence  and  kj^  the  total  number  of  looks  into 
box  i  that  are  included,  we  find  that 
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and  Eq.  (8-7)  reduces  to 


a^(i)  =  (Pi-77i)Tj-qiXj  +  r,a^(i)  , 

"  Pj’'i  ^  ®n<j>  ’  j  ^  '  •  (8-9) 

In  order  to  find  the  evader's  optimum  strategy  in  F°°  and  both  players'  good  strategies  in 
G°°,  the  space  over  which  U°°(P)  is  a  maximum  must  be  located.  Since  U°°(P)  is  linear  within 
each  hypervolume  over  which  a  single  sequence  is  optimum,  the  payoff  must  attain  its  maxi¬ 
mum  at  at  least  one  extreme  point  common  to  a  set  of  such  hypervolumes.  If  this  occurs  at  more 
than  one  such  point  —  an  unlikely  event  —  these  points  will  generate  a  space  over  which  u“(P)  is 
constant.  If  this  space  is  of  degree  N  —  1,  it  will  consist  of  a  single  hypervolume  tt^.  Other¬ 
wise,  it  will  form  a  boundary,  of  the  appropriate  degree,  that  is  common  to  a  set  of  such  hyper¬ 
volumes.  The  evader's  optimum  strategy  in  f”  and  his  good  strategy  in  G°”  consist  in  selecting 
any  P  belonging  to  this  maximum  space.  The  searcher,  on  the  other  hand,  must  find  a  proba¬ 
bility  distribution  Yg  for  selecting  one  of  the  sequences  that  are  optimum  over  this  space.  This 
distribution  must  cause  the  expected  payoff  to  be  independent  of  ^  and  hence  equal  to  max  U°°(P)  = 
V°°.  Such  a  distribution  must  exist,  since  u”(P)  is  convex.  — 
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Our  principal  problem  is  to  find  a  single  extreme  point  at  which  IT°°(P)  =  v”.  The  general 
approach  to  such  a  problem  is  fairly  simple.  Any  extreme  point  lies  at  the  intersection  of  at 
least  N  —  1  hyperplanes  (N  —  1  being  independent),  each  of  the  form  and  having  the  property 

that  the  associated  optimum  sequences  will  transform  it  at  some  point  into  Radiating 

from  such  a  point  are  a  number  of  rays,  each  formed  by  the  intersei  tion  of  N  —  2  of  the  independ¬ 
ent  hyperplanes.  If  U°°(P)  is  nonincreasing  as  P  moves  along  each  such  ray  away  from  the  ex¬ 
treme  point,  this  point  must  be  an  extreme  point  of  the  space  over  v  hich  U°°(P)  is  a  maximum. 

If  U°°(P)  is  strictly  decreasing  along  each  ray,  it  must  be  the  unique  point  at  which  u”{P)  is  a 
maximum.  One  can  start  at  a  known  extreme  point,  Pq  for  example,  find  the  ray  along  which 
U'”(P)  increases  most  rapidly,  and  the  next  extreme  point  along  this  'ay.  The  process  can  be  re¬ 
peated  until  an  extreme  point  is  found  that  satisfies  the  required  prc.perty.  Any  pair  of  extreme 
points  is  connected  by  a  network  of  such  rays,  and  this  process  will  eventually  locate  the  desired 
point. 

Although  this  process  is  simple  in  principle,  the  computational  effort  required  can  quickly 

reach  astronomical  proportions  as  N  and  Sn.  increase.  Radiating  from  each  extreme  point  are 

1  N 

at  least  2(N  —  i)  rays  and  at  an  extreme  point  such  as  Pq  there  are  .1  —  2  rays.  The  task  of 

computing  the  derivative  of  U°°(P)  along  one  of  these  rays  is  not  ea.iy,  even  after  an  associated 
optimum  sequence  along  this  ray  has  been  found.  Also,  the  total  number  of  extreme  points  be¬ 
longing  to  the  recurrent  region  alone  can  be  tremendous,  even  in  attificial  examples  where  2n. 
is  small.  The  location  of  the  maximum  point  within  a  single  hypervolume  requires  a  linear 
programming  routine  of  no  mean  size  when  N  is  large,  and  the  taf.l;  of  locating  a  maximum  point 
for  the  whole  simplex  can  quickly  exceed  the  capabilities  of  even  tive  largest  and  fastest  computers. 

As  a  result  of  these  considerations,  it  would  be  advisable  to  develop  an  efficient  method  for 


deriving  approximately  good  strategies  for  the  two  players.  The  location  of  a  point  reasonably 

close  to  the  maximum  space  would  suffice  as  an  approximation  to  he  evader's  good  strategy. 

N 

Any  payoff  U  (P)  =  S  a  (j)  p.  that  is  associated  with  some  optinum  sequence  has  the  property 
^  j=l  ^  ^ 

that  min  {a  (j)}  V"  <  max{a^(j)}.  The  sequence  limits  the  evader  to  max(a^(j)}.  If  the 

j  j  j  f  1 

sequence  selected  is  optimum  at  a  point  near  the  maximum  space  the  quantity  max|a^(j)/  — 

min  {am(j)}  is  likely  to  be  small,  and  with  it  the  searcher  should  bo  able  to  limit  the  evader  to 

a^payoff  fairly  close  to  V°°.  In  order  to  get  a  better  solution,  the  rptirnum  sequences  associated 
with  a  number  of  points  about  the  maximum  region  could  be  found,  and  from  them  a  random  se¬ 
lection  could  be  made  that  would  yield  an  expected  payoff  indepen  ient  of  P.  Such  sets  do  exist 


as  long  as  there  are  no  inadmissible  boxes.  Although  the  resulting  payoff  will  be  larger  than 
v”,  it  will  be  less  than  the  maximum  over  P  of  any  of  the  individual  payoffs. 

As  shown  in  Chapter  7,  discounting  introduces  the  possibilit/  'hat  some  boxes  may  be  inad¬ 
missible.  If  the  evader  hides  in  box  i  and  the  searcher  uses  the  good  strategy  that  applies  when 
this  box  is  eliminated,  the  evader  will  never  be  found  and  will  re  ceive  a  payoff  equal  to  p^/a  ■  If, 
on  the  other  hand,  the  evader  also  uses  his  good  strategy  that  applies  when  box  i  is  eliminated, 
the  resulting  payoff  will  equal  v”',  the  value  of  the  reduced  game.  If  p^/a  <  V  ,  box  i  is  in¬ 
admissible,  and  the  good  strategies  and  value  of  both  the  original  and  the  reduced  game  are 
identical.  This  condition  is  both  necessary  and  sufficient.  When  more  than  one  box  is  inadmis¬ 
sible,  the  good  strategies  and  values  will  be  those  which  apply  v'htn  all  such  boxes  are  removed, 
and  p^/a  <  v””  for  each  such  box. 
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Since  the  abjve  condition  depends  on  the  value  of  the  game,  or  at  least  on  the  value  of  a  re¬ 
duced  game  that  may  involve  more  than  one  box,  there  is  no  simple  method  for  finding  the  inad¬ 
missible  boxes  \  'hen  they  exist.  A  stronger  but  quite  simple  condition  exists,  however,  that 
may  reveal  the  f  resence  of  such  a  box  in  an  extreme  situation.  The  value  of  G”  cannot  increase 
as  boxes  are  rei.ioved  and  must,  therefore,  be  greater  or  equal  to  that  whiiTh  applies  when  only 
one  remains.  li  both  players  restrict  themselves  to  box  j,  the  resulting  payoff  is 

(p.  —  T7.)  -y.  —  q.d  ‘'X. 

J  J  ]  3 

1  —  r.d  ■’ 

0 

If  there  exists  e  pair  of  boxes  where 


T. 

(p.  —  n  .)  y .  —  q.d 

'j  '^3  3 


1 


r.d  J 
3 


(8-10) 


box  j  dominates  box  i,  and  the  latter  must  be  inadmissible. 

It  should  be  observed  that  the  presence  of  inadmissible  boxes  is  unlikely  unless  the  interest 
rate  is  very  hig)i  or  the  earning  rates  are  highly  biased.  The  possibility  exists  as  long  as  a  is 
unequal  to  zerc,  however,  and  must  be  taken  into  account  if  one  wishes  to  develop  a  method  for 
finding  strateg  f^s  that  approximate  the  good  ones. 

While  on  tli»  subject,  it  is  worthwhile  to  look  ahead  and  note  that  the  V,  the  value  of  game 
G,  is  a  function  of  the  moving  costs.  A  box  may  be  inadmissible  with  one  set  of  moving  costs 
but  not  with  anD.her.  Since  V  is  monotonically  nonincreasing  as  the  moving  costs  increase,  any 
box  inadmissitls  in  g”  will  also  be  inadmissible  in  G  and  G°. 


8.3  GAME  G' 


In  this  section  we  turn  again  to  the  other  limiting  form  of  the  search  evasion  game,  G°.  The 
N-box  form  of  this  game  is  fortunately  quite  similar  to  the  two-box  form,  which  we  have  con¬ 
sidered  previously,  and  we  also  have  the  good  fortune  to  learn  that  exact  solutions  can  be  found. 

In  order  to  do  tiiis,  we  must  find  a  state  vector  P  that  maximizes  the  evader's  guaranteed  payoff 
and  another  probability  vector  If  with  which  the  searcher  can  limit  the  evader  to  the  same  amount. 
The  procedurf'S  used  in  the  two-box  game  require  little  modification.  We  shall  again  consider 
the  evader's  good  strategy  first. 

As  we  have  mentioned  and  justified  previously  in  Sec.  3.2,  the  evader's  good  strategy  in  G° 

must  belong  to  i:he  class  of  strategies  in  which  the  state  vector  P  is  returned  to  the  same  position 

after  each  um:uocessful  look.  If  the  searcher  knows  the  position  of  this  vector,  he  may  use  this 

information  iu  s;electing  an  optimum  search  sequence.  Since  the  same  P  applies  before  each  look, 

a  look  that  is  optimum  once  is  always  optimum.  For  a  given  P,  the  searcher  can  limit  the  evader 

to  U°(P)  =  min(  U°(P;  i)},  where  U°(P;  i)  is  the  payoff  that  results  if  the  searcher  always  looks 
i 

into  box  i.  This  payoff  is 

N  ,  T.  ,. 

U°(P;i)  =  - — 1::^ - 3 - i_ 

1  _  d’’^(l  -  p.q.) 
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With  no  discounting,  it  reduces  to 


N 


U‘’(P;  i)  = 


S  p.p.  —  p.(T.n.  +  q.X.) 

.  .  11  1  11  1  1 
3=1  •*  •' _ 


The  evader's  good  strategy  maximizes  the  guaranteed  payoff,  and  therefore  corresponds  to  that 
P  which  maximizes  U°(P). 

Since  each  function  U°(P;  i)  is  nonlinear  in  ^  for  reasons  discussed  in  Sec.  6.3,  one  cannot 
be  hasty  in  forming  any  conclusions  regarding  the  location  of  the  optimum  vector.  However,  the 
following  properties,  developed  in  Appendix  D,  come  to  the  rescue; 

(a)  Define  Pg  as  a  point  belonging  to  the  probability  simplex  that  is  a  solution 
of  the  equations 

U°(P:  1)  =  U°(P;  2)  =  .  .  .  =  UMP;  N) 

At  least  one  Pq  must  exist,  and  each  one  must  belong  to  the  interior 
of  the  simplex. 

(b)  All  boxes  are  admissible  if  and  only  if  there  exists  a  Pq  that  is  the  unique 
point  in  the  simplex  at  which  U°(P)  is  a  maximum.  When  this  occurs,  Pq 
must  also  be  the  unique  point  that  satisfies  the  definition  in  part  (a). 

(c)  If  any  inadmissible  boxes  exist,  there  must  be  at  least  one  for  which 


a 


< 


UMPn)  <  maxU°(P) 


=  V“ 


This  statement  applies  for  any  Pq. 

(d)  In  the  subsimplex  generated  by  the  admissible  boxes,  there  exists  a 
unique  P  where  U‘“(P)  =  V°. 


The  procedure  for  obtaining  the  evader's  good  strategy  is,  therefore,  clear.  The  set  of 
equations  above  must  first  be  solved  to  obtain  a  trial  Pq.  When  discounting  is  not  used,  this  is 
automatically  the  solution.  With  discounting,  a  check  must  be  made  to  see  whether  there  are  any 
boxes  for  which  p^/a  <  U°(Pq).  If  none  exists,  the  correct  solution  has  been  found.  If,  on  the 
other  hand,  such  boxes  do  appear,  they  must  be  inadmissible  and  should  be  eliminated.  The 
above  set  of  equations  can  then  be  solved  in  the  reduced  game.  There  is  no  guarantee  that  all  of 
the  inadmissible  boxes,  if  there  are  more  than  one,  will  be  found  on  the  first  attempt.  The  proc¬ 
ess  can  be  repeated,  however,  until  no  more  appear.  When  this  occurs,  the  correct  solution  has 
been  found.  With  it,  the  evader  will  never  hide  in  any  of  the  inadmissible  boxes. 

In  the  preceding  list  of  properties,  Pq  is  not  claimed  to  be  unique  in  general  because  the 
author  was  unable  to  prove  that  it  was  true  in  general.  The  method  for  deriving  the  evader's  good 
strategy  does  not  require  this  property  to  be  true.  It  should  be  stated,  however,  that  the  author 
would  be  somewhat  surprised  if  an  example  were  found  where  Pq  was  not  unique. 

Just  as  in  the  two-box  form  of  G°,  the  searcher's  good  strategy  must  belong  to  that  class  in 
which  each  look  is  selected  according  to  a  probability  distribution  Y  =  {y^},  this  distribution  being 
independent  of  the  past  search  sequence.  In  a  manner  analogous  to  that  just  used,  we  can  let 
V/°(Y;  i)  represent  the  payoff  that  results  if  the  searcher  uses  Y  and  the  evader  hides  in  box  i. 
With  such  a  distribution,  the  searcher  linr^s  the-'evader  to  W°(Y)  =  maxW°(Y;  i),  and  his  good 

strategy  is  that  which  minimizes  W”(Y).  The  expression  for  W°(Y:  i)  is 
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W°(Y;  i)  = 


^  /  ’'i  \ 

p.  2  y.v.  —  y.  (•y.jj.  +  q.d  X.  ) 


Ti  T. 


1-2  yjd  ^  -  yii-jd  " 


which  reduces  to 


N 

p.  S  y.T.  —  y.(T.n.  +  q.X.) 
-A  j  j  ■'i  11  ^1  1 

W°(Y;  i)  = - - - - 


when  discounting  is  not  used. 

In  contrast  to  the  set  {u°(P:  i)},  these  payoff  functions  do  not  intersect  at  a  common  point 
that  satisfies  the  probability  constraints  on  Y  if  there  are  any  inadmissible  boxes.  This  should 
cause  no  difficulty,  however,  for  all  the  inadmissible  boxes,  if  any,  can  be  found  if  the  evader's 
good  strategy  is  calculated  first.  In  the  reduced  game  in  which  all  boxes  not  belonging  to  S,  the 
set  of  admissible  boxes,  have  been  removed,  all  of  the  payoff  functions  W°(Y;  i)  must  intersect 
at  a  unique  point  Yq.  At  this  point,  where  y^  =  0  if  j  does  not  belong  to  S,  W°(Yq;  i)  =  W°(Yq) 
for  all  i  belonging  to  S.  In  Appendix  D  it  is  shown  that  this  point  must  exist.  It  is  also  shown 

that  W°(Y_)  =  minW°(Y)  =  maxU(P)  =  V°.  Hence,  this  Y.  is  the  searcher's  good  strategy  in 
Y  ^ 

the  original  game. 

Once  the  evader's  good  strategy  is  known  the  searcher's  can  be  found  more  easily,  for  V° 
is  then  known.  After  removing  all  of  the  inadmissible  boxes  and  renumbering  the  remaining  ones 
from  one  to  N',  if  necessary,  we  can  write  a  set  of  equations,  each  of  the  form 


/  ’’i  \ 

^  -  ^i  m  + '’i'^  M 

- _ Izi _ ! _  =  V' 

T.  T. 

■yirid 


These  can  be  rewritten  in  the  form 


E  yj^PiTj  +  v°d  +  yj|7i(Pi  -  hj)  -  qjd  'x.  +  r^d  =  V° 


Each  such  equation  is  linear,  and  may  be  obtained  by  inverting  an  N'-by-N*  matrix. 

In  the  simplified  game  where  a  and  each  tj.  and  are  equal  to  zero,  the  good  strategies 
may  be  solved  algebraically.  The  solution  that  results  is 


^0  = 


T./q. 

r 

'  Yo  =  • 

p/qi 

N 

v°  y 

N 

N 

2  T./q. 

j  =  l  ^  ^  . 

S  p/q. 

In  the  more  general  case,  a  numerical  routine  is  necessary  to  find  the  evader's  good  strategy. 
Such  a  routine  should  not  be  too  difficult  to  establish, for  the  set  of  functions  {u°(P;  i)}  is  fairly 
well  behaved  within  the  probability  simplex.  The  function  U°{P;  i)  is  linear  over  any  hyperplane 
where  p^  is  fixed  and  is  monotonic  along  any  ray  extending  from  the  i^*^  vertex.  Also,  U°(P;  i)  is 
equal  to  a  constant  over  a  linear  hyperplane. 


97 


The  good  search  strategy  of  G°  can  be  useful  when  the  moving  costs  are  unequal  to  zero. 

If  the  searcher  makes  each  look  according  to  the  same  probability  distribution  Y,  the  evader 
does  not  need  to  move,  even  in  G“.  Rather,  he  can  collect  a  maximum  payoff  by  remaining  in 
one  box.  The  searcher's  good  strategy  Yq  in  G'  allows  the  evader  to  remain  in  any  admissible 
box.  Therefore,  provides  a  simple  search  strategy  that  limits  the  evader  to  Vwhen  the 
moving  costs  are  unequal  to  zero. 

The  good  search  strategy  of  G°  can  also  be  useful  in  more  practical  situations,  for  it  can  be 
used  to  limit  the  evader  to  V"  when  some  of  the  restrictions  imposed  by  our  game  model  are 
violated.  For  example,  the  evader  may  not  have  to  hide  before  the  game  starts.  He  may  be  able 
to  wait  until  the  search  process  has  started  and  choose  a  favorable  time  at  which  to  enter  the 
game.  Also,  he  may  be  able  to  stop  playing  the  game  temporarily.  For  example,  he  may  be 
able  to  suspend  production  temporarily  while  remaining  in  the  same  box.  Although  his  earning 
rate  would  go  to  zero,  perhaps  the  detection  probability  would,  also.  In  some  situations  it  may 
be  cheaper  to  stop  playing  than  to  move,  and  it  may  provide  a  worthwhile  evasive  device.  If  the 
searcher  uses  his  good  strategy  associated  with  G°,  however,  such  devices  are  of  no  help  to  the 
evader.  Either  the  evader  should  remain  in  an  admissible  box  for  all  time  or  he  should  not  play 
the  game  at  all.  CXir  search  evasion  game  was  motivated  by  a  problem  in  which  the  searcher 
would  be  very  happy  if  he  deterred  the  evader  from  playing  the  game.  Naturally,  if  V°  is  nega¬ 
tive,  the  evader  can  receive  a  larger  payoff  (zero)  if  he  does  not  play  the  game. 

Another  requirement  imposed  in  our  search  evasion  game  was  that  the  moving  cost  had  to 
be  incurred  at  the  time  that  a  move  was  made.  In  a  more  practical  situation,  a  moving  cost  may 
result  from  a  decrease  in  the  earning  rate  over  a  period  of  time  after  the  move.  It  would  be  ex¬ 
tremely  difficult  to  solve  a  game  with  this  feature.  If  the  good  search  strategy  of  G"  is  used, 
however,  moving  can  never  help  the  evader.  Therefore,  this  good  strategy  will  limit  the  evader 
to  V”  in  this  situation  also. 

8.4  GAME  G 

When  the  evader  must  incur  a  moving  cost  whenever  he  moves,  the  N-box  search  evasion 
game  becomes  exceedingly  complex.  In  Sec.  8.2,  we  found  that  g”  could  be  rather  complicated 
even  though  its  general  properties  were  simple  extensions  of  those  of  the  two-box  form.  In  game 
G,  we  are  not  faced  merely  with  an  Increase  in  the  size  of  the  problem.  Some  additional  com¬ 
plications  arise  that  do  not  exist  in  the  two-box  game. 

In  the  modified  games  where  the  evader  reveals  the  position  of  the  state  vector  ^  to  the 
searcher  before  each  look,  the  general  approach  used  in  Chapter  4  still  holds.  Here  we  must 
associate  a  moving  cost  with  each  of  the  possible  moves  and  can  let  represent  the  cost  incurred 
when  the  evader  moves  from  box  i  to  box  j.  As  before,  we  can  let  F  represent  the  game  in 
which  the  evader  can  still  move  before  the  next  look  and  F'  represent  that  in  which  this  opportunity 
has  passed.  The  payoff  functions  that  apply  in  these  games  when  both  players  use  optimum  future 
strategies  will  be  represented  by  U(P)  and  U'(P). 

The  functional  equation  relating  U'(P)  to  U(P)  is 

U'(P)  =  min{U'(P;  i)} 
i 

where 
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and 


N  ^  ^ 

U'{P:  i)  =  vj  2  PjPj  -  Pif  i^i  +  -  Pi^i)  u(P') 

j=i 


P — i—  P' 

In  F,  the  evader  has  the  opportunity  to  move  and  must  weigh  the  cost  of  a  transformation 
of  the  state  vector  against  a  possible  increase  in  the  future  payoff.  For  a  given  P,  his  optimum 
strategy  has  an  associated  set  {xy},  where  Xy  represents  the  probability  that  he  will  move  to 
box  j  if  he  is  in  box  i.  This  produces  a  transformation  to 


N 


i=l 


p.x. . 
1  11 


and  has  an  associated  cost  equal  to 
N 

E  X  Pi-ijPij  • 

i=l  jl^i 

The  function  U(P),  therefore,  must  satisfy  the  functional  equation 

N 


U(P)  =  max  f-  2  2  (p.XyPy)  +  U-(P')1 

i’^ijJ  L  i=l  J 


where 


P'  = 


N 


i=l 


When  all  the  moving  costs  are  identical,  the  equation  reduces  to 

r  ^ 

U(P)  =  max  -Ip  2  Ip. -p!|  +  U'(P‘) 

£'  I  i=i 


as  a  result  of  the  efficient  move  condition. 

These  functions  have  properties  quite  similar  to  those  found  in  Chapter  4.  Both  must  be 
continuous  and  convex.  Each  will  be  linear  over  a  set  of  hypervolumes  that  may  be  infinite,  and 
the  two  are  identical  over  a  no-move  region.  Outside  this  region,  the  simplex  is  partitioned  into 
a  set  of  moving  regions  within  each  of  which  a  particular  set  of  moves  is  required. 

Although  the  solution  of  these  functional  equations  would  be  a  staggering  task,  a  far  more 
difficult  problem  arises  when  we  consider  the  form  of  the  searcher's  good  strategy.  In  the  two- 
box  form  of  G,  this  strategy  could  be  generated  by  a  finite  Markov  process.  Unfortunately,  this 
cannot  be  done  when  there  are  three  or  more  boxes,  except  in  the  first  strategy  interval. 

In  the  two-box  game  there  are  only  two  moving  regions,  one  to  each  side  of  the  no-move  re¬ 
gion.  Two  mixed  states  and  two  moving  states,  at  most,  are  associated  with  the  searcher's 
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Markov  process.  When  both  players  use  their  good  strategies,  P  enters  a  moving  region  only 
when  the  Markov  process  occupies  one  of  the  moving  states.  The  probability  distributions  asso¬ 
ciated  with  the  mixed  states  allow  the  required  move  in  each  state  to  be  admissible. 

In  the  N-box  game,  on  the  other  hand,  the  state  vector  P  does  not  simply  move  back  and 
forth  along  a  line.  Instead,  it  moves  in  an  N  —  1  space,  and  can  enter  many  different  moving 
regions.  This  has  rather  serious  implications,  for  it  is  not  possible  to  construct  a  finite  Markov 
graph  that  will  have  a  set  of  mixed  and  moving  states  with  the  required  properties.  In  order  to 
illustrate  the  problem,  we  shall  examine  an  extremely  simple  N-box  game  for  which  a  partial 
solution  has  been  found. 


8.4.1  Symmetric  Three-Box  Game  with  Simple  Reward  Structure 

Let  us  consider  the  three-box  game  where  all  the  detection  probabilities  are  the  same  and 
in  which  the  simple  reward  structure  of  Chapters  2  through  5  is  used.  Since  this  is  the  simplest 
game  involving  more  than  two  boxes,  we  can  be  sure  that  any  complications  that  arise  here  will 
arise  in  general. 

With  this  setup,  both  G°  and  g"”  have  trivial  solutions.  In  G°, 


as  a  result  of  symmetry,  and  V°  =  3/q.  In  G°°,  the  evader  should  initially  hide  according  to  the 
probability  distribution 


and  the  searcher  should  make  an  equally  likely  choice  from  the  31  periodic  search  sequences  in 
which  the  boxes  are  examined  in  order.  The  value  is  V*  =  (3/q)  —  1.  It  can  be  shown  that  p  =  2. 

oo  ^ 

Hence,  when  p  exceeds  this  value,  G  behaves  essentially  the  same  as  G  . 

Over  the  first  strategy  interval  =1,  G  also  has  a  simple  solution.  Just  as  in  the 

two-box  game,  the  evader  should  return  the  state  vector  to  the  same  point  after  each  look.  The 
searcher's  good  strategy  can  be  generated  by  a  Markov  process  where  each  state  is  defined  by 
the  last  look.  Each  such  state  is  both  a  mixed  and  a  moving  state.  In  such  a  state,  a  look  into 
any  box  is  admissible,  and  the  evader  may  move  to  the  box  just  searched  if  he  is  in  any  other. 
These  properties  hold  for  the  first  strategy  interval  in  any  game  G. 

In  this  example,  symmetry  causes  the  solution  to  be  very  simple.  The  evader  should  always 
return  the  state  vector  to 


{1  1  1} 

\  3  ,  3  ,  3  / 


The  boxes  are  identical,  and  letting  i,  j  and  k  represent  the  three  different  boxes,  we  may 
write 

U'{Pq)  =  1  +  (-?^)  U(P')  , 

where  Pq  — ^  P'  when  box  i  is  examined.  In  order  to  return  P'  to  Pq,  the  evadei;  if  in  j  or  k, 
should  move  to  box  i  with  probability  q/3.  Therefore, 

U(P')  =-p  a  (__^,  +  u'(Pp)  . 


It  follows  that  U'(Pg)  =  (3/q)  -  (2/3)  p  =  U{Pq). 
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In  the  Markov  process  that  generates  the  searcher's  good  strategy,  the  probability  distribu¬ 
tion  associated  with  the  state  must  be  of  the  form  y^(i)  =  y,  yj(j)  =  yj{k)  =  (1  —y)/Z.  State  (t. 
applies  when  the  last  look  was  into  box  i.  The  payoffs  associated  with  cr^  must  be  of  the  form 

N 

W.(P)  =  W!(P)  =  Pj  . 

J=1 

where  aj(i)  —  ^^(j)  =  aj(i)  —  aj(k)  =  p.  The  solution  reveals  that  y  =  {i  —  a)/{3  —  oi)  and  that 

W.(P)  =  W!(P)  =  I  Pi  +  (|  -  1^)  (Pj  +  Pfc)  ■ 

The  searcher  should  initially  make  an  equally  likely  look,  and  he  limits  the  evader  to 


As  long  as  p  does  not  exceed  one,  beyond  which  y  is  negative,  these  strategies  are  the  good 
strategies,  and 

V(p)  =  ;  0<p<l=p^  . 

Once  p  exceeds  p^  =  1,  real  problems  arise,  for  the  searcher's  good  strategy  may  no  longer 
be  generated  by  a  simple  Markov  process.  The  evader's  good  strategy  is  fairly  simple,  however, 
and  we  shall  derive  it  first.  The  general  form  of  this  strategy  can  be  guessed  if  one  considers 
the  behavior  of  the  searcher's  good  strategy  when  p  =  p^.  At  this  point,  y  =  0  and  the  searcher 
never  repeats  the  same  look  twice  in  a  row.  This  indicates  that  the  evader  should  no  longer  re¬ 
store  P  to  the  point  at  which  the  payoff  is  indifferent  to  any  next  look.  Rather,  if  box  i  was 
examined  last,  he  should  restore  P  to  a  point  where 


If  the  look  preceding  the  last  one  was  into  box  j,  it  is  reasonable  to  assume  that  the  searcher 
will  be  more  likely  to  look  into  box  k  than  into  box  j.  Therefore,  let  us  assume  that  the  only 
admissible  move  after  looks  into  boxes  j  and  i,  in  that  order,  is  from  k  to  i.  Before  the  look 
into  box  i,  we  have 

1  —  p 

Pj  =  P  ,  Pi  =  Pk  =  • 

After  this  look, 

=  _ ^P  =  (1  -  P)  I- 

2p  +  (1  -  p)  (1  +  r)  '  Pi  2p  +  (1  -  p)  (1  +  p) 

and 

p'  =  _ L-P _ 

Pk  2p  +  (1  —  p)  (1  +  r) 

The  evader  must  transform  this  to 

Pi  =  p  ,  Pj  =  P^  = 
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by  a  possible  move  from  k  to  i.  Thus,  pj  is  unaffecti  ri,  and 


_ _ ,  2p  ^  ±-p 

j  2p  +  (1  -  p)  (1  +  r)  2 


p!  = 


Therefore, 


P  =  - 


(2  +  r)  -f  VS  +  4r 


1  —  r 

The  correct  transformation  occurs  if  the  evader  moves  from  the  box  that  has  not  been  examii^ 
during  the  last  two  looks  (k)  to  the  one  just  examined  (i)  with  probability 


4  X,  .  =  < 

ki 


.  2 


At  th*kbeginnin^Q^k»e,game,  .the  evader  should  hidt  at  the  point 

P  -'f—  i  ) 

— 0  ■*’  3  ■  T'  3  J 


since  U(P)  mus^  ^e  a  maximum  at  this  point.  If  box  is  examined  first,  the  evader  should  move 
to  that  box  if  H«vis  in  either  of  therothers  with  probab  lity  1  —  (2  +  r)  [(t  -  p)/2],  for  this  strategy 
transforn!^  P  to  the  desired  point.  Once  two  or  more  Icoks  have  occurred,  the  evader  moves 
only  in  the  manner  discussed  in  the  preceding  paragraph 

With  this  strategy,  the  payoff  will  be  independent  of  ;vhere  the  searcher  looks  as  long  as  he 
never  examines  the  same  box  twice  in  succession.  7he  jpiaranteed  payoff  is 


U(Po)  = 


3-q)  (3  -»/9-4q) 


The  payoff  will  equal  the  value  as  long  as  the  searcher  can  limit  the  evader  to  this  amount. 

Let  us  first  assume  that  such  a  search  strategy  exis  s  (it  does)  and,  in  addition,  that  it  is 
Markovian  in  form.  In  the  Markov  graph,  each  state  would  be  defined  by  the  last  two  looks.  Thus, 
we  can  let  represent  the  state  that  applies  if  the  :ast  tvo  looks  were  made  into  boxes  j  and  i, 
where  j  precedes  i.  The  searcher  should  not  repeat  a  look  into  i.  Therefore,  we  can  let 
yjj(k)  =  y  and  =  1  —  y-  The  payoffs  associated  vith  must  be  of  the  form 

W..(P)  =  W!.(P)  =  a..(i)  p.  +  a..(j)  p.  +  a..(0  p, 
ji'-'  ji'— '  ji'  ji'J'  ji'  *^k 

where 


aji(i)  -  a^i(k)  =  p  and  0  <  a^j(i) -- a^.{j)  <  p 
By  means  of  the  usual  functional  equations,  we  can  find  that  y  does  exist.  The  result  is 


and 


W..(P)  =  —  p.  +  (— -  1)  p.  +  (.1-p,  p, 

At  the  beginning  of  the  game,  the  searcher  has  no  past  search  sequence  to  define  a  state  in 
the  Markov  chain.  Therefore,  we  must  add  the  states  0-^,  and  q^.  The  state  Oq  is  used 

at  the  beginning,  and  one  of  the  others  will  apply  a  'ter  the  first  look.  Because  of  symmetry. 


10.2 


yo(i> 


for  all  i,  and  y^(j)  =  y^dc) 


With  these  probabilities. 


W.(P)  =  W!(P)  =  -|  p.  +  f-l  -  i-  (1  +  p)]  (p.  +  pj^) 


and 


vVo^'P)  =  WMP)  . 

Since  Wp(P)  >  U(Pq),  when  p  >  p^,  the  strategies  we  have  developed  for  the  two  players 

cannot  both  be  good  strategies.  The  reason  for  our  difficulty  becomes  apparent  if  we  consider 

what  happens  after  the  first  look.  If  box  i  is  examined  first,  the  evader  must  move  to  it,  il  not 

already  there,  with  a  probability  that  transforms  ^  to  p.  =  p,  p.  =  p  =  (l  _  p)/2.  On  the  other 

1  3  K 

hand, 

Wi(P)=|pi  +  f|-4-(l+p)J(p^+P^) 


and 


[a.(i)  -  a.(j)]  =  [a.(i)  -  a.(k))  =  ^  (1  +  p)  <  p 
searcher's  strategy,  therefore,  causes  the  necessary  moves  after  the  first 


when  p  >  1. 
look  to  be  inadmissib 

These  moves  must  be 
moving  strategy  that  does  not  invoK'e 
amined.  Therefore,  we  must  find  a  search  s 
does  exist,  in  which  the  searcher  introduces  a  trans 
with  each  state  cr^..  This  strategy  can  be  found  by  letting  the  p 


ible.  A  little  reflection  shows  that  there  can  be  no  satisfactory 
move  after  the  first  look  into  the  box  just  ex- 
that  allows  these  moves.  Such  a  strategy 
into  the  probability  y  associated 


^^s^ciated  with  the  state  (t^. 

ji  =’•'  ./  o  1-  .. 

be  a  function  of  the  total  number  of  looks  that  have  occurred.  Thus,  we  ca 


^j"(P)  =  W.-lV)  =  t  «j"(k)p^ 


k=l 


We  can  also  let  the  associated  look  probabilities  be  yjj(j)  and  y^j(k),  or  y  and  i  —  y  *.  In  contrast 
to  the  usual  recursion  equations,  a  set  of  difference  equations  must  be  written  to  determine  the 
good  strategy.  Because  of  symmetry  =  a.*?(k)  =  a^.(j).  .  .  ],  this  set  can  be  written  in  a 

IJ  J.  K1 

compact  form  as  follows; 


lly  .  %  .  ,  lln 

a..{i)  =  t  +  a.. 
Ji  JI 


U) 


n. . , 
a . .  (l ) 


j  .  Hx  n+l/.x  ,  n  n+l/,  . 

1  +  (1  -  y  )  re  (i)  +  y  a^j  (k) 


n /  I  %  j  t  t  A  ti  \  I  ^  1  /  •  % 

a^.(k)  =  1  +  (1  -  y  )  a^.  (k)  +  y  ra^j  (i) 

Requiring  a  move  from  k  to  j  to  be  admissible  adds  the  equation 


-Jl(i)  =  an(k) 
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The  solution  for  is 


a."(j)  =  filil  +  A  ^7:/5  +  4r|n^^ll  +  VS  +  4r  In 
ji'J'  1  —  r  2(1  +  r)  I  2(1  +  r)  J 


The  coefficient  B  must  be  set  equal  to  zero  so  that  the  transient  will  decay  with  time. 

The  freedom  left  by  the  arbitrary  coefficient  A  allows  us  to  make  the  necessary  moves  after 
the  first  look  admissible.  That  is,  we  can  require  that  Sj(j)  —  Sj(i)  =  Sj(j)  —  Sj(k)  =  ji,  where 
iTj  — ^  ji‘  °'ji  is  entered  after  two  looks  have  been  made,  n  will  be  two  less  than  the  total 

number  of  looks.  The  solution  is 


a.'?(i)  =  _  1) 

Ji  q  I 


—  (2  +  r)  +  n/  5  +  4r 


-  V5  + 
2(1  +  r 


t-  4r  n 
Fj— 


.n,.,  3  ,,,  ,,  f3  —  «J5  +  4r  1  f  1  —  \J  5  +  4r  In 

^ji<J>  ^  2(l-r)  ~l  2(1  +  r)  'J 


a^^(k)  =  ^^"(i)  —  n 


H-  +  (tx-1)  [ 


3  —  NfS  +  4r  1  [  1  —  5  1  4r  In 


With  these  values. 


Wo(P)  =^-4  ^  +(H-1)  I 


(3-q)  (3->,/9-4q) 


This  solution  yields  the  searcher's  good  strategy  in  the  first  part  of  the  second  strategy'  in¬ 
terval.  As  n  approaches  infinity,  the  transients  that  have  been  introduced  die  out,  and  each 
term  agrees  with  the  corresponding  one  associated  with  the  simpler  search  strategy  that  we  tried 
first.  The  steady-state  value  of  yjj(k)  =  y’^  is  i/(3  —  ji),  and  this  approaches  one  as  p  approaches 

U  =  2. 

P  n  0 

Unfortunately,  the  transient  introduced  into  YjjCk)  causes  y^.(k)  to  equal  one  when  |jl  is  less 

than  u.  ,  to  be  exact,  when 
P 

_  iO  +  3r  —  4r^  +  (2  +  r)  V5  +  4r 
2(4  +  2r  —  r^) 

The  first  part  of  the  second  strategy  interval,  therefore,  ends  at  this  point  and  does  not  extend 


nr  strateg)^  subintei'val,  the  look  probabilities  satisfying  the  difference 
In 

has  occurred.  The  difference  equations 

equations  are  used  after  a  gta  T-t 

ended.  The  co- 

themselves  assure  the  proper  admissible  moves  once  the  start-up 

efficient  A  is  adjusted  to  insure  that  the  required  moves  during  the  start-up  process  are  admis 


ended.  The  co¬ 


in  succeeding  subintervals,  the  start-up  process  lasts  longer.  During  this  process,  some 
of  the  looks  are  made  deterministically,  and  some  of  the  moves  that  were  previously  admissible 
no  longer  are.  Once  the  start-up  process  has  been  coinpleted,  the  look  probabilities  are  the 
same  as  before  except  that  the  coefficient  A  is  different.  When  the  stait-up  process  is  over,  the 
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evader  has  managed  to  maneuver  P  into  the  position  where  p .  =  p,  p^  =  pj^  =  (1  —  p)/ 2,  where  i 
represents  the  last  box  examined.  As  p  approaches  Pp,  the  start-up  process  becomes  infinitely 
long.  The  interval  1  p  <  2  =  Pp  is.  therefore,  partitioned  into  an  infinite  number  of  strategy 
subintervals.  The  start-up  process  differs  from  subinterval  to  subinterval, whereas  the  general 
behavior  thereafter  is  the  same  for  all  those  subintervals  belonging  to  the  same  strategy  interval 
(p^.Pp).  In  a  more  general  N-box  game,  there  may  be  a  finite  number  of  strategy  intervals,  but 
there  will  always  be  an  infinite  number  of  subintervals  in  all  but  the  first. 

8.4.2  Approximate  Solutions  for  G 

In  the  example  Just  discussed,  an  exact  solution  of  G  was  found  in  the  first  strategy  interval 
and  in  the  first  subinterval  of  the  second.  Finding  the  solution  in  any  other  of  these  subintervals 
would  be  an  enormous  task,  and  it  should  be  apparent  that  the  prospects  of  finding  an  exact  solu¬ 
tion  to  a  more  general  N-box  form  of  G  are  slight,  if  not  out  of  the  question. 

Methods  for  finding  approximate  solutions  are.  therefore,  in  order.  Although  no  method 
has  been  developed,  a  general  approach  to  finding  an  approximately  good  search  strategy  is  sug¬ 
gested  for  future  research.  With  this  approach,  we  assume  that  the  searcher  has  a  poor  memory 
and  can  remember  only  where  he  has  made  his  last  n  looks.  Since  he  can  use  only  information 
he  remembers,  his  optimum  search  strategy  under  this  condition  can  be  generated  by  a  Markov 
process.  In  particular,  each  recurrent  state  in  the  process  is  defined  by  where  the  last  n  looks 
were  made.  Transient  states  that  apply  when  fewer  than  n  looks  have  been  made  also  exist. 

As  an  example,  let  us  consider  the  approach  that  could  be  taken  when  n  =  2.  For  simplicity, 
we  shall  let  pj  =  Tj  =  1  and  rj  .  =  Xj  =  0.  Letting  o-^.  represent  the  state  that  applies  when  the  last 
look  was  made  into  box  i  and  was  preceded  by  a  look  into  j,  we  can  express  the  payoffs  asso¬ 
ciated  with  in  the  form 

N 

Wl-tP)  -  y.  p, 

ji'—  ! ' .  11  "^k 

k=l 


Wji(P)=  L  a..(k)p^  . 

k=l 

The  payoff  W^j(P)  applies  when  the  evader  can  move  before  the  next  look  and  Wjj(P)  applies  when 
this  opportunity  has  been  lost.  The  cost  of  moving  from  box  k  to  box  t  is  Therefore, 

a..(k)  =  max{— p,  ,  +  a'..(f)} 

Ji  I  ^  '^kf  ji' 


With  each  state  we  can  associate  a  set  of  look 


If  box  k  is  examined,  a  transition  to  cr.,  occurs.  Therefore, 

ik 


probabilities  y..(k)  where  S  y..(k)  =  1. 
P  k=l 


Similar  equations  can  be  written  for  each  of  the  transient  states.  At  the  beginning  of  the 
N 

game,  the  payoff  Wjj(P)  =  S  p.  applies,  and  the  searcher  limits  the  evader  to  max{a-(k)}  = 

k=l  k 

max{a!,(k)}.  To  find  the  look  probabilities  that  minimize  the  above  expression,  a  nonlinear  pro- 
k 

gramming  routine  is  necessary.  It  is  this  routine  that  must  be  studied  in  detail. 

As  n  approaches  infinity,  the  searcher's  memory  improves,  and  in  the  limit,  the  approxi¬ 
mating  strategy  approaches  the  good  search  strategy.  In  the  process,  the  number  of  states  in 

the  Markov  process  can  increase  rapidly.  With  a  memory  of  n  looks,  there  can  be  n”  recurrent 
■  n-1 

states  and  S  transient  states. 

3=0 

The  total  number  may  not  be  as  large,  however,  for  some  states  may  be  superfluous.  When 
n  =  3,  for  example,  niay  equal  zero  for  all  i.  If  this  occurs,  the  state  will  never  be 

entered  and  is  of  no  interest.  The  problem,  of  course,  is  to  find  a  method  for  predicting  such 
an  event  before  the  solution  is  attempted.  Such  a  method  may  lie  in  solving  the  n  —  1  approxima¬ 
tion  first.  It  appears  likely  that  if  y^^{i)  =  0  for  all  j  in  the  n  —  1  approximation,  )  must 

also  equal  zero  for  all  i,  j  in  the  n  approximation.  If  this  property  can  be  proved  to  exist,  the 
problem  of  finding  a  good  approximation  to  the  searcher's  good  strategy  can  be  greatly  simplified. 
The  solutions  to  the  approximations  of  order  0,  1,  2,  .  .  .  can  be  found  in  order,  and  the  process 
can  be  stopped  when  diminishing  returns  are  found  or  when  the  computational  effort  becomes 
too  large. 


CHAPTER  9 
CONCLUSION 


The  game  that  we  have  considered  is  a  two-sided  extension  of  a  one-sided  search  problem. 
Although  all  search  problems  need  not  be  considered  from  a  two-sided  point  of  view,  this  is  some¬ 
times  necessary.  In  our  game,  the  search  is  directed  against  a  conscious  evader  or  an  object 
controlled  by  such  an  evader.  The  evader  can  observe  the  searcher's  actions  and  can  capitalize 
on  any  errors  he  makes.  At  the  beginning  of  the  game,  the  evader  hides  in  one  of  several  boxes, 
each  of  which  has  an  associated  detection  probability.  The  search  process  consists  of  a  sequence 
of  looks  into  the  various  boxes  until  the  evader  is  found.  Each  look  into  a  given  box  takes  a  fixed 
amount  of  time.  A  particular  evasion  device  —  moving  between  looks  —  has  been  assumed.  The 
game  is  zero-sum,  and  a  fairly  general  reward  structure  that  can  include  discounting  has  been 
developed.  The  reward  coefficients  associated  with  this  structure,  as  well  as  the  location  of  the 
boxes  and  their  detection  probabilities,  are  known  to  both  players. 

We  have  been  able  to  derive  the  good  strategies  for  the  two  players  when  the  game  involves 

two  boxes.  In  G",  exact  solutions  can  be  obtained  when  there  exists  a  pair  of  integers  n,  and  n, 
n,  ni  1  i 

such  that  r^  ~  escape  probabilities  r^  and  r^  are  the  complements  of  the  detection 

probabilities.  An  exact  solution  can  also  be  found  if  one  or  both  of  the  detection  probabilities  are 

equal  to  unity.  When  the  ratio  of  the  escape  probabilities  is  irrational,  an  approximate  solution 

can  be  obtained.  This  approximation  can  be  made  to  any  desired  degree  of  accuracy.  In  game  G, 

where  moving  is  allowed  at  a  cost,  the  solution  to  the  searcher's  good  strategy  is  identical  to  that 

of  G°°  if  the  moving  costs  are  prohibitive.  When  these  moving  costs  are  not  prohibitive,  the  exact 

good  strategies  can  be  found  in  general.  Exact  solutions  can  also  be  obtained  in  G",  where  the 

moving  costs  are  equal  to  zero. 

The  search  evasion  game  becomes  much  more  complex  when  there  are  three  or  more  boxes. 
Although  G°  may  still  be  solved  exactly,  the  computational  effort  required  to  solve  G“  can  be  pro¬ 
hibitive.  When  more  than  two  boxes  are  involved,  the  general  properties  of  G  become  quite  dif¬ 
ferent,  and  the  good  search  strategy  can  no  longer  be  generated  by  a  finite  Markov  process.  The 
limited  memory  approach  to  finding  an  approximation  to  the  good  search  strategy  is  suggested  for 
future  research.  Such  a  strategy  can  still  be  generated  by  a  Markov  process. 

The  results  of  our  study  of  F"  may  be  useful  in  treating  some  one-sided  search  problems. 

The  optimum  search  strategy  is  quite  simple  when  the  position  of  the  evader,  or  object,  can  be 
described  by  a  probability  vector.  Only  the  problem  of  locating  the  point  at  which  U”(P)  is  a 
maximum  causes  the  solution  of  the  N-box  form  of  G"  to  be  difficult.  If  the  object  is  not  a  con¬ 
scious  entity  whose  motives  are  opposed  to  those  of  the  searcher,  we  have  no  reason  to  suppose 
that  the  worst  of  all  probability  vectors  applies.  Any  reasonable  statistical  estimate  of  the  posi¬ 
tion  of  the  object  should  be  prefera^e  to  taking  the  pessimist's  approach,  i.e.,  using  the  minimax 
solution.  y 

Most  of  the  reward  structure  tn^  iias  been  developed  could  be  useful  in  treating  a  one-sided 
search  probierii  of  this  type.  The  ^detection  loss  A.  could  be  useo  lo  ,cpres'*nt  a  reward  associated 
with  finding  the  object.  The  reward  associated  with  a  given  look  could  be  used  to  represent  the 
co.st  of  making"the  look.  It  would  be  difficult  to  imagine  a  problem  in  which  a  look  was  less  costly 
to  make  if  the  object  were  in  the  box  examined.  Therefore,  the  tj's  would  normally  be  equal  to 
zero.  In  most  situations,  there  Would  also  be  no  reason  to  suppose  that  the  earning  rates  varied 
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from  box  to  box.  If  one  wished  to  locate  a  faulty  part  that  was  causing  damage  over  time  in  a 
complex  system,  however,  they  might  be  of  use.  Clearly,  the  set  of  look  times  and  discounting 
could  be  useful  in  a  practical  one-sided  search  problem. 

At  this  point,  it  is  worthwhile  to  review  some  of  the  qualitative  aspects  of  the  good  strategies 
associated  with  the  two-box  form  of  G.  Let  us  first  consider  the  evader's  good  strategy.  If  an 
arbitrary  strategy  is  assigned  to  the  evader,  we  may  define  his  position  as  the  search  process 
proceeds  by  means  of  a  probability  vector.  If  the  probability  that  he  is  in  one  box  becomes  suf¬ 
ficiently  large,  the  evader  should  move  from  this  box  if  he  is  there  with  a  certain  probability. 

This  causes  the  probability  vector  describing  his  position  to  be  transformed  to  the  nearest  bound¬ 
ary  of  the  no-move  region. 

The  searcher's  good  strategy  can  be  generated  by  a  finite  Markov  process.  In  some  states 
of  this  process,  the  next  look  is  made  deterministically.  In  others,  the  mixed  states,  the  next 
look  is  made  according  to  a  probability  distribution.  When  the  searcher  uses  his  good  strategy, 
the  evader  will  collect  a  payoff  equal  to  the  value  if  he  never  moves.  That  is,  not  moving  is  al¬ 
ways  an  admissible  alternative  of  the  evader's  good  strategy.  In  certain  situations,  a  particular 
move  is  also  admissible.  As  the  moving  costs  increase,  deterministic  looks  are  made  more  fre¬ 
quently,  and  the  situations  in  which  a  move  is  admissible  occur  less  frequently. 

When  the  moving  costs  are  prohibitive,  the  searcher's  good  strategy  is  identical  to  the  one 
that  applies  in  G",  the  game  in  which  moving  is  prohibited.  In  this  strategy,  the  searcher  makes 
a  random  selection  from  two  infinite^ search  sequences.  Once  this  choice  has  been  made,  the 
search  process  is  completely  deterministic.  This  strategy  minimizes  the  payoff  that  results  if 
the  evader  never  moves.  The  evader  should  not  move  because  such  an  action  can  only  decrease 
the  payoff. 

When  the  moving  costs  are  not  prohibitive,  the  searcher  should  not  use  his  good  strategy  that 
applies  in  G".  If  he  were  to  use  this  strategy,  the  evader  could  gain  a-definite  advantage  (perhaps 
a  very  large  one)  by  using  a  strategy  that  involved  some  deterministic  moves.  No  search  strategy 
that  allows  the  evader  to  gain  a  definite  advantage  by  moving  can  be  the  good  search  strategy. 
Therefore,  the  good  search  strategy  is  the  one  that  minimizes  the  no-move  payoff  without  violating 
this  condition. 

In  a  sense,  the  good  search  strategy  maximizes  the  number  of  situations  in  which  moving  is 
an  admissible  alternative  subject  to  the  above  condition.  In  each  strategy  interval,  a  particular 
transition  diagram  is  associated  with  the  Markov  process  that  generates  the  good  search  strategy. 
This  diagram  includes  one  or  two  moving  states.  In  such  a  state,  a  particular  move,  as  well  as 
not  moving,  is  an  admissible  alternative.  In  the  remaining  states,  no  moves  are  admissible.  If 
the  moving  costs  are  increased  sufficiently,  a  new  strategy  interval  will  apply.  In  the  associated 
transition  diagram,  there  are  more  states  in  which  no  moves  are  admissible.  One  cannot  use  the 
previous  transition  diagram  in  this  strategy  interval.  If  one  were  to  try,  he  would  find  that  some 
of  the  transition  probabilities,  or  look  probabilities,  associated  with  the  mixed  states  would  be 
negative. 

In  the  N-box  form  of  G,  the  good  search  strategy  cannot  be  generated  by  a  finite  Markov 
process.  Associated  with  the  evader's  good  strategy  is  a  set  of  situations  in  which  moving  is  re¬ 
quired  with  a  nonzero  probability.  No  Markovian  search  strategy  can  yield  a  payoff  that  is  indif¬ 
ferent  to  whether  or  not  the  move  is  made  in  each  of  these  situations.  (The  one  exception  to  this 
statement  applies  in  the  first  strategy  interval.)  In  the  symmetric  three-box  example  that  was 
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solved,  we  saw  that  there  was  a  Markovian  search  strategy  which  allowed  all  of  the  moves 
associated  with  the  evader's  good  strategy  to  be  admissible  except  those  that  applied  during  the 
start-up  process.  Unfortunately,  the  start-up  process  occurs  at  the  beginning  of  the  game,  and 
the  early  behavior  of  the  game  has  the  strongest  influence  on  the  payoff. 

In  G°,  no  cost  is  incurred  by  the  evader  when  he  moves.  As  a  result,  the  searcher  cannot 
gain  any  inference  concerning  the  evader's  position  from  his  past  sequence  of  unsuccessful  looks, 
and  each  look  should  be  made  according  to  the  same  probability  distribution.  The  good  search 
strategy  causes  the  payoff  to  be  independent  of  the  evader's  position  and  vitiates  the  influence  of 
moving  (all  moves  are  admissible). 

When  the  N-box  form  of  G“  was  considered,  we  saw  that  the  associated  good  search  strategy 
may  be  useful  when  evasion  devices  not  included  in  our  game  are  considered.  For  example,  the 
evader  may  be  able  to  select  a  favorable  time  after  the  search  has  started  to  enter  the  game.  He 
may  also  be  able  to  temporarily  suspend  production  or  leave  the  game.  These  additional  devices 
do  not  aid  the  evader  if  the  searcher  uses  his  good  strategy  associated  with  G°.  This  strategy 
would  not  be  the  good  strategy  in  the  more  general  game  of  this  type.  It  can  be  calculated,  how¬ 
ever,  and  it  may  prove  useful  in  a  practical  two-sided  search  problem  involving  these  additional 
devices. 

Although  the  search  evasion  game  we  have  studied  includes  only  one  evasion  device,  it  has 
demonstrated  the  interesting  influence  that  a  conscious  evader  can  have  on  the  outcome  of  a 
search  process.  This  study  should  only  whet  the  appetite  for  deeper  studies.  In  a  more  general 
game,  it  will  be  more  difficult  to  find  exact  solutions.  In  fact,  there  is  no  reason  to  suppose  that 
a  value  and  good  strategies  will  always  exist.  Nevertheless,  the  further  development  of  relatively 
efficient  search  procedures  that  take  the  actions  of  the  evader  into  account  should  prove  interest¬ 
ing  and  useful. 
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APPENDIX  A 

THE  SEARCHER'S  OPTIMUM  STRATEGY  IN  F“ 

The  searcher's  optimum  strategy  in  F°°  requires  each  next  look  to  be  made  into  a  box  for 
which  p./3.  is  a  maximum,  where  P  =  {p^}  represents  the  value  of  the  probability  vector  that  ap¬ 
plies  when  the  decision  is  made  and 


^i  = 


q.d  ^ 

- (p.  +  QfX.)  +  an. 

y.  '^1  1 


In  order  to  prove  this,  let  us  adopt  the  following  notation.  Let  S  represent  an  infinite 
search  sequence,  and  let  S  =  (s..  s., .  . .  .S.  )  represent  a  partition  of  this  sequence  into  an  ordered 

1  3  X 

set  of  subsequences  where  s.,  s^, .  .  .  are  finite  and  Sj^  is  infinite.  Let  t.  represent  the  length  of 
Sj  in  time.  Let  P.  ^  represent  the  a  posteriori  position  into  which  the  a  priori  vector  P  is 
transformed  by  the  sequence  (Sj.  s^, .  . .  )  if  detection  does  not  occur. 

Letting  U“(P;  S)  =  U^iP;  s.,  s.,  S.  )  represent  the  payoff  given  JP  and  S  =  (s.,  s .,  S,  ),  we  may 
express  this  payoff  in  the  form 

U^iP;  s.,  Sj,  S^)  =  f(P:  s.)  +  g(P;  s.)  d’^'  u“(P.;  s.,  S^) 

T.+T  . 

=  f(P;s,,s.)  +  g(P;s..s.)  d  "  Ju“(P.  .;S.)  . 

13  ^3  j  X 

The  contribution  f(P;  s^)  to  the  payoff  occurs  during  the  subsequence  s^  when  P  is  the  a  priori 
value  of  the  state  vector;  g(P;  s^)  equals  the  probability  that  detection  does  not  occur  during  s^. 

Let  us  consider  an  arbitrary  infinite  sequence  Sj  and  an  arbitrary  a  priori  P.  Let  s^  repre¬ 
sent  the  subsequence  of  Sj  that  starts  at  the  beginning  of  Sj  and  continues  up  to,  but  does  not  in¬ 
clude,  the  first  look  that  violates  the  optimum  search  rule  (s^  may  be  empty).  Let  C  represent 
the  set  of  boxes  that  could  be  examined  on  the  next  look  without  violating  the  optimum  search 
rule,  and  let  Sj^  be  the  maximum  subsequence  following,  s^  that  does  not  include  a  look  into  a 
box  belonging  to  C.  Letting  B  represent  the  set  of  boxes  that  are  examined  at  least  once  in  Sj^, 
we  see  that  B  0  C  =  0. 

The  next  look  following  Sj^  is  into  a  box  belonging  to  C.  This  box  will  be  called  box  c  and 
the  look  into  this  box  will  form  the  subsequence  s 


moment,  let  us  assume  that  s,  is  finite.  Then  S, 

b  I 


Let  S  ,  be  the  remainder  of  S,.  For  the 
d  I 

(s  ,  s,  ,  s  ,Sj). 
a  b  c  d 


By  reversing  the  order  of  Sj^  and  s^,  we  can  define  a  new  sequence 

Sii  =  <WVSd)  • 

This  definition  will  extend  the  sequence  of  optimum  looks  by  one  look  [s^  =  (s^,  s^)],  and  we 
must  show  that 

A  =  U^iP;  Sj)  -  U“(P;  Sjj)  >  0  . 

These  payoffs  may  be  expressed  in  the  form 


U  (P;  Sj)  =  f  (P;  s^)  +  g(P;  s^)  d  “  U  (P^;  s^,,  s^,  S^) 


U  (P;  Sjj)  =  f  (P;  s^)  +  g(P:  s^)  d  **  U  (P^;  s^,  Sj,,  S^)  . 


Ill 
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k. 


g(P:s^)  d  f(Pc:Sb)  =  d 


""  E  jpi'Ji  i  -r'  (pi(--a'  )-Vi^ "  z 

icB  1  j=l  '  '  k=l 

.-■'I  ••■■■■ 


,(k) 


-  X. 


+  p  d 


/('B 


f(P:s^)=  ^  PjPiVc +Pc['>'c<Pc-’'c’-‘ic‘*  ‘'^c]  +  E  PiPi^c  : 


i£B 


If^c 

i<B 


g(P:  Sj,)  d  ''f(Pj^:  S^)  =  d  2  Pi>'i  'pjYj.  +  d 


'*'c<Pc  “  ''c>  “ 


i€B 

Tu 


''^1 


d  E  PiPiV. 


i^c 

f'B 


By  collecting  terms  and  noting  that  =  (1  —  d  ^)/a  and  Yj  =  (1  —  d  ^)/a,  we  can  reduce 
A  to  the  form 


k. 


A  = 


.-s  .K'"' 

ieB  '  '  j=l 


-  E  Pi^ii 


(1  -d’^'")  (1  -d’^^)  .■’’i 


ieB 


k. 


r.(k)  k.  '  T.(j) 

1  .  _  1  V  J  1 


q.  E  E  d '  +r.^  i;  d 


[  j=l  k=l 

■^c  i-1 

-  E  Pi^i(l  -  d  )  qj  X  d  + 


J=1 


k. 


p  p  q  d 
•^c  c  m 


ieB 


J='l 


T  T, 

^  (1  -  d  (1  -  d  ^  ^  ,'^c,.  Jb, 

+  p  V  - - — _ 1  +  pqXd  (1— d  ) 

^c'c  a  c 


But, 


k.  k. 

V  i-1  V  ^i*j> 

E  r.J  ^d  *  <  2  d  ' 


J=1 


j=l 


113 


n' 


r'^it 


and 


j=i 


‘Ji  2  s 

j=l  k=l 


k. 

1 

s 


j=l 


Therefore, 


A  >  A^  =  -  Yi  Pjd  -  d 


ieB 


+  Pc  -  d 


i/i  l¥‘-(‘^)-"v.‘.| 

’ii /•=  , ,  ,  ,  X  d'=l  , 

f  \  a  I  c  c  I 


But, 


k. 


therefore, 


ieB 


^  ^  =  E  ^jr^d  ‘ 

ieB 


}=i 


r  k. 


i  d"‘'’ 


L  j=i 


{p  p  —  p.p .) 

'^c  c  ^11 


Since  P^P^  ^  Pi^i  ^  belonging  to  B, 


A  >  A^  >  0 


If  the  sequences  Sj^  and  s^  are  reversed,  therefore,  the  payoff  is  reduced.  If  Sj^  were  to 
increase  in  length  and  become  infinite,  the  inequality  A  ^  A^  could  only  become  stronger,  since 
some  of  the  members  of  {k^}  must  eventually  become  infinite.  Therefore,  if  Sj  =  (s^,  Sj^),  the 
payoff  can  be  reduced  by  inserting  a  look  into  a  box  belonging  to  C  between  and  Sj^.  Hence, 
the  sequence  Sjj  =  (s^,  s^,  Sj^)  would  yield  a  lower  payoff  than  Sj  =  (s^,  Sj^).  The  process  can  be 
continued  indefinitely  until  S.  has  been  replaced  by  a  sequence  S  .  This  yields  the  minimum 

1  a 

payoff,  and  the  assumed  optimum  search  rule  is  indeed  optimum. 
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APPENDIX  B 

SOME  PROPERTIES  OF  THE  TWO-BOX  MODIFIED  GAMES  F  AND  F' 


I  . 

t 

i 


Consider  the  sets  of  truncated  modified  games  {f^}  and  {F^}.  These  games  have  the  same 
definitions  as  F  and  F'  except  that  F^  follows  F^  and  F^_^  follows  F^.  In  Fq,  play  stops  and 
the  evader  collects  the  payoff  V°,  the  value  of  G°.  Associated  with  F^^  and  F^  are  the  payoff 
functions  U„(P)  and  U^(P)<  respectively,  which  apply  when  both  players  use  optimum  strategies. 
The  functional  equations  are 


U;^(P)  =  min 


and 


where 


U  (P)  =  max 


Uo(P)  =  \r 


U;^(P;l)  =  p[yj(Pi-77j)-qid’^^X^j 
+  (1  -  P) 

ffPr,yl-P]  clX-l[pr,Ti--p| 

UA<P:2)  =  P72Pi 

+  (1  -  P)  |y2(P2  “  ■" ^^2] 

+  (P  +-(1  -  P)  r^]  [p  +  (1  -  P)  r^] 

-p^(P' -  P)  +  UA(P')  ,  P’>P 


-P2(P-P')  +  UA(P') 


P'  $  P 


We  shall  require  both  boxes  to  be  strictly  admissible;  that  is. 


Pi  ^  <Pz  - ’’z’  ^2 -^2^  ^2 

“  ’’z 

1  -  r2d  ^ 


Pz  (Pi  -  »)  !>  Ti  -  qjd 

“o'  T~. 


1  -  r  jd 


1 


Theorem  1. 


For  all  n  >  0:  U  (P)  and  U'(P)  are  continuous  and  convex, 
n  n 

In  Fq,  the  function  Uq(P)  =  V“.  Hence,  Uq(P)  is  continuous  and  convex.  To  prove  the  theo¬ 
rem,  we  shall  show  that 

^(P)  is  continuous  and  convex  ===*■  U^(P)  and  IIA(F)  are  continuous  and  convex. 
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For  convenience,  we  shall  also  use  the  same  technique  to  show  that  Ujj(P)  and  U^(P)  are  both 
piecewise  linear  for  all  finite  n. 

Assume  that  continuous,  piecewise  linear,  and  convex.  Consider 

U;^(P;  1)  =  p|vi(Pi  -»?i)-qid’^^\lj 

+  (i  -  P) 

+  (Pr,  +  1  -  P)  . 

Clearly,  U^{P;  1)  must  be  continuous  since  function  linear  over  a 

set  of  intervals  that  form  a  partition  of  the  interval  (0,  1).  Over  we  may  express 

U  ,(P)  in  the  form  ^ 

n-1 

U  ,(P)  =  a.""^  P  +  (i  -  P) 

n-1  1  1 

Define  by  the  relation 


For  all  P  €  up': 

1 

U;^(P;  1)  =  p|yi(p^  -  Tij)  -  +  r^d’^^a.”-^J 

+  (i-P)  [riP2  +  d^‘b."-‘j 

=  a.”p  +  b."(l  -  P) 

Hence,  U^(P;  1)  is  piecewise  linear  over  each  interval  belonging  to  {f where  {’f”}  partitions 
the  interval  (0,  1). 

Let  =i>  for  all  P.  £  and  for  all  P.  e  P.  >.p,.  The  function 

a  1  11  J  J  1  1 

l^P)  i®  convex.  Hence, 


Therefore, 


n  n  .n.n 

TT  >  TT .  <  >  a.  <  a .  and  b .  >  b .  , 

j  1  j  1  j  1 

and  U'(P;  1)  is  also  convex, 
n 

The^same  reasoning  can  be  used  to  show  that  U^(P;  2)  is  continuous,  piecewise  linear,  and 
convex; 


U^(P)  =  min 


u;^(P;l) 

u;^(P;2) 


Therefore,  U^(P)  is  also  continuous  and  convex.  It  is  piecewise  linear  as  long  as  1)  and 

U^(P;  2)  do  not  intersect  at  an  infinite  number  of  points.  In  proving  Theorem  2,  we  shall  show 
that  these  functions  intersect  at  a  unique  point  P^  e  (0,  1).  Hence,  U^(P)  is  piecewise  linear. 

U^(P)  can  be  constructed  from  U^(P)  by  using  the  techniques  discussed  in  Sec.  4.3,  and 
U^(P)  must  also  be  continuous,  piecewise  linear,  and  convex. 


Theorem  2. 


In  F^,  there  exists  a  unique  Pq  : 


0  <  P  ■*  <  i 


P  <  Pi 


u;,{P;l)  >  u;^(P;2) 


p  >  Pi 


u;^(P;l)  <  U^(P;2) 


Let  us  adopt  the  notation 


5;j(P';i)+  = 


dU'(P;i) 


dP 


P'+ 


5;(P':i)- 


du;j{P;i) 


dP 


P'- 


To  ease  the  notation  further,  we  shall  assume  that  a  statement  concerning  6^(P:i)+  over  an 
interval  (Pj,  Pj^)  applies  only  for  P^  $  P  <  Pj^,  unless  an  explicit  statement  is  made  to  the  con¬ 
trary.  Similarly,  a  statement  concerning  6'(P;i)—  over  (P.,  P,  )  will  apply  only  if  P.  <  P  <  P,  . 

n  J  “  J 

A  statement  concerning  an  unsigned  quantity  6^(P;i)  will  apply  to  both  6^(P;i)+and  6^(P;i)—  once 
the  above  condition  is  imposed.  If  a  statement  concerns  several  unsigned  quantities,  such  as 
6^(P;  i)  and  <5^(P;  2),  it  will  be  inferred  that  the  statement  applies  as  long  as  both  quantities  are 
evaluated  by  taking  the  limit  of  the  derivative  in  the  same  way. 

We  shall  prove  the  theorem  by  demonstrating  a  stronger  property;  namely, 

for  all  P  e  (0,  1): 

6^(P;2)  >  6^(P;  1) 

and 


u;^(0;2)  <  u;j(0;  1) 
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u;^(l;2)  >  u;^(l;  1) 

where  n  >  1 . 

First  consider  FJ : 


u^iP;  1)  =  p|ri(Pi-»?i)  +  (1  -  P)  y^P2 


+  [Pr  +  1  -  P]  d  V° 


U^(P;  2)  =  Py2p^  +  (1  -  P)  jyz^Pz  -  '^z^-  ^Z^  ^^2 


+  [P  +  (1  -  P)  r^l  d  ^V“ 


The  game  is  equivalent  to  F®  when  the  evader  is  not  allowed  to  move  until  after  the  first  look. 
Both  boxes  are  strictly  admissible.  The  functions  U^(P;  1)  and  UJ(P;  2)  must  intersect  at  the 
point  Pq  that  corresponds  to  the  evader's  good  strategy  in  G®,  where  0  <  P^  <  1  (see  Appendix  D). 
Both  functions  are  linear  over  (0,  1).  By  taking  the  derivative  of  U^{P;  2)  and  applying  the 
inequalities 


a 


>  V"> 


Vy)  -  q,d 


1  —  r^d 


it  can  be  shown  that 

for  all  P  €  (0,  1); 

a^(P;2)  >  0 

Similarly, 

for  all  P  €  (0,  i): 

6^(P;  1)  <  0 


Therefore,  the  required  properties  are  satisfied  in  F^. 

We  shal’  now  assume  that  these  properties  are  satisfied  in  F^  where  n  >  2;  moreover, 

we  shall  show  that  they  are  also  satisfied  in  F^.  To  do  this,  we  must  first  consider  the  special 

modified  game  F"  .  In  F"  ,  the  evader  is  not  allowed  to  move  until  two  looks  have  been  made; 

“  n  n 

that  is, 

p  11  look  ^  jj,  I  look  ^  jj, 

n  n-1  n-2 


Let  U^(P:  i  j)  represent  the  payoff  in  F^  when  the  evader  uses  an  optimum  strategy  and  the 
searcher  an  optimum  strategy  after  looking  first  into  box  i  and  then  into  box  j.  No  moving  can 
occur  until  two  looks  have  been  made.  Therefore,  both  U^(P;  12)  and  U^(P;  21)  can  be  expressed 
as  functions  of  the  first  two  looks  plus 
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1 


Tl+T,  r  Pr,  1 

fPri  +  (1  -  P)  d  [pr;  Ml  -  P)  r^J  ' 

Taking  the  difference  between  tJ'^(P;  12)  and  UJ|^(P;21)  cancels  this  unknovm  term,  and  it  can  be 
shown  that 

for  all  Pe(0,l): 

6'^(P;21)  >  5'^(P;  12) 

The  functions  U'^(P;  22)  and  U'^(P;21)  differ  only  in  their  dependence  on 

P^-l  [p  +  (1-  P)  r^  '  ^A-1  [p  +  (1  %)  r^  '  '  respectively. 

For  all  P  e  (0.  1): 

d'  ,(P;2)>6'  ,(P;1) 
n-1  n-1 

and  it  can  be  shown  that 

for  all  P  e  (0,  1): 

6'^(P;22)  >  6';^(P;21)  . 

Using  the  same  reasoning  on  U'^(P;  12)  and  Uy^(P;  11)  and  combining  results,  we  find  that 
for  all  P  €  (0,  1); 

a';j(P;22)  >  6yj(P;21)  >  a'^CP;  12)  >  a'^(P;  11)  . 


But, 


U'^(P;i)  =  min 


U';j(P;il) 

U'^(P;i2) 


Therefore, 


for  all  P  e  (0,  1): 

a';(P;2)  >  a'^jiP;  1)  . 


Consider  the  functions  U'  ,{P)  and  U  .(P).  In  these  cases  F"  -*  F'  ,  in  exactly  the  same 
n-1  n-1  n  n-1  ■' 

manner  as  F^  -»  F^  Let  P  and  P^  represent  the  bounding  points  of  the  no-move  region  in 
F^  Furthermore,  define  P_(l),  F  (2),  P^(l)  and  P^(2)  by 

P_(l)  — — >  P_ 

P_(2)  — ^  P_ 

P^(l)  P+ 

P+(Z)  P+  • 


i 
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and  note  that  P  (2)  <  P_(l).  P_|_(2)  P^(l). 

For  all  P  c  (0,  P_): 


For  all  P  c  (P^.  1); 


«n-l<P+>-^«n-l<P>=-'^2»«U(P+)+  • 

From  this,  it  follows  that 

for  all  Pe[0;P_(i)]; 

«;;[P_(i):  ij-  »  6’„(P;  i)  =  «'n[P.(i):  i)-  >  <5;;[P.(i);  i]+  ; 

for  all  P  e  [P^{i),  1]: 

«;;[P+(i):  il-  >  6;,(P;  l)  =  ^  ‘5nlP+(i>:  ^1+  • 

Furthermore,  if  P  <  P^  (i.e.,  P  ^  have 

for  all  P  e  [PJi),  P^(i)]: 

6^(P:  i)  =  a^jCP;  i)  . 

Both  convex.  Therefore,  5J^(P;  i)  and  5^(P;  i)  are  monotonically 

nonincreasing  functions  of  P.  Furthermore,  if  Pj  <  P^, 

«;^(P.;  k)+ «;^(Pj;  k)-  ,  etc. 

Consider  the  interval  (0,  P_(2));  in  this  case 

P  e  (0;  P_(2)]  ==>  P  e  (0;  P_(l)] 

Therefore, 

for  all  P€[0,  P_(2)): 

fi|^{P;  2)  >  6|^(P_(2);  2)+  >  6^[P_{i);  2]-  >  6^[P_(1);  1]- 

(This  statement  need  only  be  considered  when  P  ^0,  and  hence  when  P  (2)  <  P  .(1).) 

Now  consider  the  interval  (P  (2),  P  (1)]  and  note  that  this  interval  may  intersect  [P^(2),  1]. 

For  all  P  e  [P_(2),  P_(1)J: 

a;^(P;  2)  >  «;j[P_(i);  2]-  »  6;;[P_(1);  2]- 

>  6"[P_(1);  1]-  >  6;,[P_(1);  1]- 

=  6;,(P:  1)  . 

Therefore, 
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for  all  Pe  [0,P_(1)]: 

6;^(P;2)>d;^(P;l)  . 

A  similar  development  shows  that 

for  all  Pe[P^(2).  1]: 

a;j(P;2)  >  6;^(P;1)  . 

If  P_(i)  <  P^(2),  we  may  write: 

for  all  P  €  [P_(i),  P^(2)J; 

5'(P;2)  =  6"(P;2)  >  6"(P;  1)  =  6'{P;  1) 
n  n  n  n 

Hence, 


for  all  P  e  (0,  i); 

5'(P;2)  >  6'(P;  1) 
n  n 

To  complete  the  proof,  we  must  show  that 
U;^(0;2)  <  U;^(0;  i)  , 

U;j(l;2)  >  U;^(i;  1)  . 

To  prove  the  first  inequality,  we  can  apply  the  inequalities 
^1^2  ^2 

-I4-  =  ^  >  V»  >  max  U^.,(P)  >  U^.,(0) 
1  -  d  ^  ^ 


S>  U”(0)  = 


'''2<P2'“ ’’2*"  ^2^*  ^2 


1  -  r^d 


'  2 


to  the  expression 


U;(0;l)-U;(0;2)  =  y^p^-y^(p^-r,,) 


+  q2d"%+  (d"l-r/")  U„_^(0)  . 

The  second  inequality  can  be  proved  by  using  the  equivalent  approach  (or  by  switching  the  labels 
on  the  boxes). 

Convergence We  can  view  the  truncated  games  and  as  being  those  games  in  which 
the  moving  costs  are  set  equal  to  zero  after  n  looks  have  occurred.  As  n  increases,  the  evader 
must  wait  longer  before  he  can  move  at  no  cost.  Hence, 

for  all  P  e  (0,  1)  ,  n  >  0; 
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for  all  P  e  (0,  1)  ,  n  >1: 

Since  the  evader  need  incur  a  moving  charge  only  when  it  is  to  his  advantage, 
for  all  P  e  (0,  1)  ,  n  >i: 

UJP)  »U;j(P)  (P)  . 

For  any  fixed  P  e  (0,  1)  neither  Uj^CP)  nor  U^(P)  can  increase  with  n,  and  both  functions  are 
bounded  from  below  by  U°°(P).  Therefore,  both  functions  must  converge  in  the  limit  and  we 
can  define 

U(P)  =  lim  U  (P) 
n-»“ 

U'(P)  H  lim  U'(P) 

n-Mo 

The  functions  U^(P)  and  U^(P)  are  bounded,  continuous,  and  convex.  Hence,  in  the  limit, 
U(P)  and  U'(P)  have  these  properties  also.  Furthermore,  there  must  exist  a  unique  Pg  e  (0,  1) 
for  which 

P  <  Pg  U'(P;2)  <  U'(P;  1) 

p  >  Pg  U'(P;i)  >  U'(P;2) 

Although  0  <Pg"<  1  for  all  finite  n,  we  cannot  automatically  infer  that  0  <  Pg  <  i  in  the  limit. 
To  prove  that  this  property  exists,  we  must  show  that 

U'(0:2)  <  U‘(0;  i)  , 

U'(l;2)  >  U’(l;  1) 

These  inequalities  can  be  proved  true  by  using  the  previous  method,  since  the  inequalities  such 
as 


If'Z 


y?(p?  -  ’J?)  - 


1  -r^d 


hold  in  the  limit. 

Therefore,  there  exists  a  pair  of  functions 

U(P)  E  lim  U  (P) 
n-»“o 


U'(P)  E  lim  U^P 
n-«o 


that  satisfy  Eqs.(4-3)  and  (4-4)  where 

(1)  Both  are  bounded,  continuous  and  convex, 

(2)  There  exists  a  unique  Pg: 
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P  <  Pq  =5>  U'{P;  2)  <  U'(P;  1) 
P  >  Pq  ==>  U‘(P;  2)  >  U'(P:  1) 


Uniqueness:—  In  the  evader  can  guarantee  a  payoff  of  U^(P)  and  the  searcher  can  limit 
the  evader  to  given  P.  Hence  U^(P)  is  the  value  of  F^,  given  P.  Similar  considerations 

apply  in  F^.  For  all  n  >  0,  P  e  (0,  1),  there  exists  a  minimum  probability  of  detection  on  the 
next  look  that  is  equal  to 


min 


n 


>  0 


This  statement  applies  in  the  limit  as  well,  since  0<Po<  1.  Hence, 

lim  Pr{F  lasts  to  Fq}  =  0 
n-»«>  ” 


This  equation  implies  that,  for  any  given  P  e  (0,  1),  the  evader  can  guarantee  a  payoff  lim 

n-»«> 

U^(P)  and  the  searcher  can  limit  the  evader  to  this  amount  in  F.  Similar  considerations  apply 

in  F'.  Hence,  lim  U  (P)  is  the  value  of  F,  given  P,  and  lim  U'(P)  is  the  value  of  F',  given  P. 

n-»«>  ”  n-*"  ^ 

Furthermore,  there  is  complete  feedback  (for  any  P)  in  Eqs.  (4-3)  and  (4-4).  Therefore,  lim- 

n-»oo 

U  (P)  and  lim  U' (P)  are  the  unique  bounded  solutions  to  Eqs.  (4-3)  and  (4-4),  and  they  yield  the 

n— «> 

optimum  strategies  in  F  and  F'. 


Theorem  3. 

Pg  €  (P  ,  P^),  the  no-move  region.  Assume  that  Pg  <  P  ,  and  define  P  (1)  by  the  relation 
P  (i)  — >  P  . 


The  function  U(P)  is  linear  over  (0,  P  ).  Hence,  U'(P;  1)  is  linear  over  [0,  P  (1)],  where 
P_(1)>P_.  But, 

for  all  Pc(Pg,l);  U'(P)  =  U'(P;  1) 

Therefore,  U'(P)  is  linear  over  [Pg,  P_(l)],  which  includes  P  as  an  interior  point.  This  state¬ 
ment  contradicts  the  definition  of  P  in  Eq.  (4-5). 

In  the  same  manner,  Pg  cannot  be  greater  than  P^.  Therefore,  Pg  e  (P  ,P^). 
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APPENDIX  C 

A  PROPERTY  OF  {w:^(P)} 

In  Sec.  b.5,  the  general  method  for  calculating  the  searcher's  optimum  strategy  in  H'  (his 
good  strategy  in'  G)  was  developed.  The  method  was  extended  to  cover  the  generalized  reward 
structure  and  discounting  in  Chapters  6  and  7.  One  at  imption  was  made  which  we  must  now 
prove,  i.e.,  that 

dW.''"(P) 

<  - 5P -  <  M-J  for  each  aF 

The  payoff  associated  with  each  state  in  the  searcher's  Markov  process  is  linear  in  P.  Let 

dW'''(P) 

-  dP  * 

dW,^(P) 

^i  =  dP 

,  dW.^P)  dW.'^(P)  dU.(P)  dU.'(P) 

'i  =  =  -ip-  • 

The  above  functions  are  those  that  apply  when  the  assumed  optimum  search  strategy  is  used.  The 
derivatives  and  6^  are  associated  with  n.*',  and  dJ'  is  associated  with  In  addition,  we  can 
associate  all  of  them  with  an  interval  w.  belonging  to  both  the  no-move  and  recurrent  regions.  We 
can  also  define 


dW,(P) 

dW|(P) 

■iU^CP) 

°+  =  dP 

dP 

dP 

dW  (P) 

dW^(P) 

dU_(P) 

=  dP 

dP 

dP 

Let  us  first  consider  the  case  where  both  moving  regions  extend  into  the  recurrent  region. 
Number  the  intervals  (if  any)  of  the  no-move  region  to  the  left  of  Pq  as  ir  2'  •  ■  ■  •  Number 
the  intervals  (if  any)  that  belong  to  the  no-move  region  and  lie  to  the  right  of  Pq  as  tr^,  ...  . 
Let  TT^  (if  it  exists)  represent  the  interval  belonging  to  the  no -move  region  that  is  adjacent  to  tr 
Similarly,  let  represent  the  interval  in  the  no-move  region  that  is  adjacent  to  (Note  that 
and  may  be  the  same  interval.) 

The  recurrent  chain  of  the  searcher's  Markov  process  can  be  partitioned  into  two  parts, 

Z>—  and  S+,  and  must  have  the  following  form: 


|3-2t-<l>5l 


I 

N 
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The  states  a  ^  and  (7 


are  the  same  if  P  =  P, 


O' 


States  <7  and  a  are  equivalent  if  P  =  P 
1  +  +0 


Let  us  consider  the  effect  that  the  "good"  probability  distribution  Y  ^  =  {y  ^(1),  y  ^(2)}  has 
on  the  payoffs  associated  with  each  crF  e  S—  other  than  cr  .  If  y  ^(2)  =  1,  then  =  6.t.  If,  on 

the  other  hand,  y  ,(2)  =  0,  then  6!’'  =  6.^.  .,  where  6.^,  .  d}.  Since  0  ^  y  ,{2)  i,  we  find  that 

t  r  t  t  ^  t  ^  r 

But,  -^2  <  <5|  and  "  '  ''  - 


latter  event, 


du;(p) 

«  "b  « 


but 


dU'^(P) 

dP 


Therefore,  we  must  prove  that  ^  —  when  o-j^  e  S— . 

Similar  reasoning  can  be  used  to  show  that  if  <7^  e  S+,  then  5j  Hence,  for  any 

state  other  than  belonging  to  S+,  we  find  that  —y^  ^  1^1'  must  prove  that 

if  CT  e  2+. 
a 

We  shall  first  show  that  both  (7^  and  must  belong  to  the  same  set  S  —  or  S+.  Assume  that 


b 

let 


e  S-. 


Let  -*■  (7^  imply  that  a  deterministic  sequence  of  looks  connects  u.  to  .  Also, 


a. 

1 


G.  -*  G 

k  n 


imply  that  the  same  deterministic  sequence  connects  0-5  to  and  to  cr^.  For  the  moment, 
set  y  ^(2)  =  1,  yj(l)  =  i.  Since  o-j^  e  S— , 


But, 


r  r  2  r  i 

-^b  ^  '"-1 - -  %  **  *"1  - ' 


r  r 

<T  C,  <7  , 
b  -1 


Therefore, 


e  S-. 


-  a  -1  + 

and  <7^  e  S— .  A  similar  process  can  be  used  to  show  that  cr^  e  2- 

To  prove  the  theorem  in  the  case  where  both  moving  regions  extend  into  the  recurrent  re¬ 
gion,  we  need  only  show  that  6^^  ^  —  y^  if  ^  2—.  In  general,  6^  <  6  =  [x^.  If  6^  =  5  ,  we 

are  at  the  lower  boundary  of  a  strategy  interval,  and  the  proof  can  be  accomplished  in  the  pre¬ 
ceding  strategy  interval.  Therefore,  we  can  assume  that  6  ^  <  6  =  • 

r  r  ^  , 

If  both  O'  and  o’,  belong  to  2—,  then  ir  must  lie  to  the  left  of  P^,  (i.e.,  P  ^  Pn)-  If 
a  D  a  —  — , 

least“twu  intervals  belonging  to  the  no-move  region  lie  to  the  left  of  P„,  then  o'’  - 
r  2  "  0-1 


o^  and 


"b- 


Therefore, 
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t  t  2 

or  -*  cr  . - -  O' , 

a  -1  + 


If  TT  =  (P  ,  P«),  then  and 

a  -  0"  a  -1 


t  2 

17  - -  cr , 


In  either  case. 


Therefore,  W|_(P)  =  W  (P)  =  U  (P)  is  functionally  related  to  W|^(P)  exactly  as  W^‘(P)  = 
W^(P)  =  U^(P)  =  is' functionally  related  to  W^(P)  =  W^(P)  =  U^(P).  We  may  express  these 

payoffs  in  the  form 

W_(P)  =  a_P  +  b_(l  -  P) 

W^(P)  =a^P  +b^^(i-P)  , 

W*(P)  =  a^P  +  b*(l  -  P)  , 

W^(P)  =  a^P  +  b^(l  -  P) 

The  payoff  coefficients  must  be  related  as  follows: 

r 

a  =  X  +  y  a.  , 
a  •'a  b 

t 

a  =  X  +  y  a ,  , 

a  a  ■'a  + 


+  y^.K 


‘^a  =  ^b  +  yb‘’+ 


k  “T  X 

where  y^  is  of  the  form  r^  d  and  yj^  is  of  the  form  r^  d  where  k,  T  <  * 
If  both  r^  and  r^  are  greater  than  zero. 


5a<«-  =►^1 


■->^2  =  «+<*b 


127 


If  one  of  the  r's  is  equal  to  zero,  the  properties 


W^(P_)  =  W_(P_) 

and 


W^(P^)  =  w^(p^) 

together  with 

<  6 
a 


If  both  r^  and  are  equal  to  zero,  the  assumed  case  in  which  P^^  <  P  <  Pq  cannot  occur. 

Therefore,  >— Also,  Hence  W^’*(P)  =  W^(P),  so  that  6^’'  =  6^.  Thus, 


In  order  to  complete  the  proof,  we  must  consider  the  case  where  one  but  not  both  of  the 
moving  regions  extends  into  the  recurrent  region.  Let  us  assume  that  P  <  P^^  and  P^  <  Pq2' 
In  this  case,  the  recurrent  chain  must  have  the  following  form: 


Here,  ct  ^  is  a  pure  state  and  is  the  interval  lying  in  the  recurrent  region  with  P^^  as  a 

boundary.  For  any  CTj  e  S— ,  the  function  W!(P)  =  U!(P).  Therefore, 6!^^  ^  6^  only  if  e  S+. 
For  any  such  state,  6^  ^  6!^  ^i  1"  belong  to  2+.  Hence,  —  ^  for 

each  recurrent  state  (tP, 


APPENDIX  D 

SOME  PROPERTIED  OF  G' 


We  must  show  that  the  following  conditions  exist  in  G*. 

(1)  Define  Pq  as  a  point  belonging  to  the  probability  simplex  that  is  a  solution 
of  the  equations 

U  “(P:  1)  =  U*(P;  2)  =  .  .  .  =  U*(P;  N) 

At  least  one  Pg  must  exist  and  each  one  that  exists  must  belong  to  the  inte¬ 
rior  of  the  simplex. 

(2)  All  boxes  are  admissible  if  and  only  if  there  exists  a  Pg  that  is  the  unique 
point  at  which  U°(P)  is  a  maximum.  If  this  occurs,  Pg  must  also  be  the 
unique  solution  of 

U“(P;  1)  =  U°(P;2)  =  ...  =  U*(P;N) 

(3)  If  any  inadmissible  boxes  exist,  there  must  be  at  least  one  for  which 


This  statement  applies  for  any  Pg. 

(4)  In  the  subsimplex  generated  by  the  admissible  boxes,  there  exists  a 
unique  P  where  U°(P)  =  V”. 

(5)  There  exists  a  Y  belonging  to  the  probability  simplex  W’^h  which  the 

searcher  can  limit  the  evader  to  V”.  If  box  i  is  inadmissible,  y.  =  0. 
If  box  i  is  admissible,  VJ°{Y;i)  =  W“(Y>.  ^ 

We  have  defined  U°(P)  by  U(P)  =  min  U”(P;  i),  where 

i 


a  T. 

1  —  r.d  ^ 
J 


Consider  some  properties  of  U”(P;i).  Let  S„  represent  the  probability  simplex  where  for 

N 

all  i  =  1,  2,  .  .  .  ,  N,  Pj  >  0,  and  S  p^  =  1.  Let  P^  represent  the  i  “  vertex  of  where  p.  =  1. 

i=l 

A  ray  belonging  to  extending  from  p'^  intersects  the  opposite  face  at  a  pointy'  where  p!  =  0. 
Along  this  ray  each  component  of  F  other  than  p.  can  be  expressed  in  the  form  p.  =  (1  —  p^^)  p!, 
and 
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dU°(P;i) 


(l  -  -  p.qj)]‘ 


-(l-r.d")r.  I  p.p. 

.  . 

T.  r  ’’•1 

+  <l-d  ‘)  |y.(p.  -  Tjj) -Qjd  I 

At  any  point  along  this  ray  except  P'  (and  at  this  point  also  if  a  >  0),  1  —  d  ^{1  —  p^q^)  >  0.  There¬ 
fore,  along  a  ray  extending  from  P^,  U“(P;i)  behaves  monotonically.  That  is,  as  p^  decreases, 
U“(P;  i)  must  be  monotonically  increasing  or  decreasing, or  equal  to  a  constant.  (If  box  i  does  not 
dominate  any  other  box,  U”(P;  i)  must  be  monotonically  increasing  as  p.  decreases.) 

Let  R^(c)  represent  the  hyperspace  U“(P;i)  =  c.  This  equation  can  be  expressed  in  the  form 

Pi(Yi<Pi  -  hj)  -  q^d  +  c)]  +  yj  2  p.p^  =  c  . 

Therefore  R^(c)  is  a  linear  hyperplane.  If 

min  U"(P;  i)  <  c  <  max  U‘‘(P;  i) 
p^Sn 

then  Rj^(c)  partitions  Sj^  into  two  nonempty  hyperspaces  of  degree  N  —  1.  Let  us  exclude  R.(c) 
from  each  of  these  spaces.  Then,  U'(P;i)  must  be  greater  than  c  over  one  of  these  spaces  and 
less  than  c  over  the  other.  This  follows  from  the  monotonic  behavior  of  U°(P;i)  along  any  ray 
in  Sj^  that  intersects  P^.  If  c  is  equal  to 

max  U°(P;  i)  or  min  U°(P;  i)  , 

then  Rj(c)  must  include  at  least  one  vertex  of  Sj^.  Therefore,  U°(P;  i)  achieves  its  maximum  over 
Sj^  at  least  one  vertex  and  also  its  minimum  at  at  least  one  vertex. 

Let  {A,  A'}  represent  a  partition  of  the  boxes  into  two  sets.  Let  represent  the  subsim- 

plex  of  S|^  where  Z  p^  =  1.  Define  T^  as  the  hyperspace  belonging  to  where 
i  eA 

for  all  i,  j  £  A:  U’(P;  i)  =  TI”(P;j)  =  U"(P;  A) 

Let  us  show  that  there  exists  at  least  one  P^  belonging  to  where 

for  all  i  =  l,2 . N  ,  U°(Pq;  i)  =  U'(Pp) 

Such  a  point  must  belong  to  the  interior  of  Sj^,  for  at  any  point  P  belonging  to  a  boundary  of  Sj^, 
there  must  exist  at  least  one  p.  =  0  and  one  p^  >  0.  In  this  situation,  U°(P;  i)  >  U°(P;  j) . 

We  shall  first  prove  that  this  property  is  satisfied  when  N  =  2.  The  simplex  is  then  the 
unit  interval  on  the  real  line,  and  we  may  write 


.  Y.(P.,  -  tj.) -q.d  X,  p,  , 

U°(P  ;  1)  =  ~ i - i - - - 1  <  —  =  U”(P  ;  2) 

—  T,  a  ' 

1  -  r^d 
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U'>(P‘’;E)  = 


v,(p7  -  V,)  -q^d 


^2  ^2  2 

=  U“(P  ;  1) 


As  a  result,  there  must  exist  at  least  one  belonging  to  S^. 

We  can  use  induction  to  prove  that  at  least  one  Pq  exists  in  Sj^.  Let  A  include  the  first 
N  —  1  boxes  in  a  set  of  N  boxes.  Assume  that  there  exists  a  Pq^  belonging  to  where 

for  all  ieA;  U°(Pq^;  i)  =  U"(Pq^;  A)  . 


Hence,  we  assume  that  intersects  S^.  Also, 


N 

for  all  i  c  A:  U°(P  ;i)  = 

Therefore,  contains  the  vertex  P^.  Even  when  a  =  0,  the  point  p'^  is  a  bounding  point  ofT^, 
for  U°(P;  i)  is  bounded  and  continuous  over  the  interior  of  Sj^. 

A  simple  manipulation  reveals  that 

N  N 

U°(P  ;  A)  >  U'’(P  ;  N) 

and 

u“(Poa=a>  <u«(Po^:N)  . 

Therefore,  there  must  be  at  least  one  point  P^  belonging  to  T^  where 
U"(Pq;N)  =  U*(Pq;A)  =  U«(Pq)  . 

Let  us  assume  that  all  boxes  are  admissible.  Then, 

'’i 

for  all  i  =  1,  2, _ N:  —  >  max  U'{P)  =  V' 

Consider  a  point  Pq  belonging  to  Sj^  where 
U°(Pg)  =  V  ^  V”  . 

The  intersection  of  Rj^(V')  with  includes  the  interior  point  P^  and  must  partition  into  two 
nonempty  hyperspaces  of  degree  N  —  1 .  But, 

P- 

for  all  j  ^  i:  U°(P^;  i)  =  ^  >  V°  >  V ' 

Therefore,  all  but  the  i*'^  vertex  are  included  in  one  hypcrspace  and  P^  is  included  in  the  other. 
Over  the  latter,  U°(P;  i)  <  V.  This  inequality  is  true  for  any  i  and  implies  that  for  any  P  ^  P^ 
belonging  to  there  exist  an  i  and  j  where  U°(P;  i)  <  V'  <  U°(P;  j).  Therefore,  Pp  must  be 
the  unique  intersection  in  of  {U”(P;i)}.  It  must  also  be  the  unique  point  at  which  U°(P)  is  a 
maximum . 

Before  proving  the  converse  of  this  theorem,  it  is  necessary  to  develop  some  additional  prop¬ 
erties  concerning  {U°(P;  i)}  .  Consider  a  partition  {A,  A'}  where  A  contains  more  than  one  box. 
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In  this  case,  e  for  all  i  £  A'.  Also,  there  exists  at  least  one  point  belonging  to 

T.  0  S,  .  Consider  the  hyperplane  Ra(c)  =  H  R,(c).  If 
A  A  A  jgA  1  y 

min  U°(P;  A)  <  c  <  max  U“(P;  A)  , 

then  R^(c)  partitions  into  two  nonempty  hyperspaces.  If  the  boundary  R^(c)  is  excluded  from 
both  of  these  hyperspaces,  U“(P;  A)  is  greater  than  c  over  one  and  less  than  c  over  the  other.  It 
follows  that  U”(P:  A)  must  be  a  maximum  at  a  vertex  P*  e  or  at  a  point  belonging  toT^Os^. 

Similarly,  it  must  be  a  minimum  at  at  least  one  such  point. 

Let  us  assume  that  there  exists  aPg  that  is  the  unique  point,  belonging  toSj,j  at  which  U°(E)  =  V”. 

Let  A  include  all  but  the  box.  Let  Pq^^  represent  a  point  belonging  to  D  S^.  The  function 
U°{Poa;N)  >U(Pg^;A).  Therefore,  U“(P(j^;A)  <  U“(Pq;  A)  =  V°.  This  is  true  for  any  P^^  belong¬ 
ing  f o  n  S^.  Hence,  U°(Pg;  A)  must  be  a  maximum  at  P^.  The  hyperplane  R^(VT  cannot  con¬ 
tain  P^  as  well  as  Pq,  for  it  would  then  be  of  degree  one  and  intersect  S^.  Therefore, 

N" 

=  U^P  ;A)  >U“(Pq;A)  =  V 

Similarly,  p^/a  >  V°  for  each  box,  and  all  boxes  must  be  admissible.  Therefore,  if  U"(P)  is  a 
maximum  at  a  unique  point  that  is  some  Pq,  all  boxes  are  admissible  and  P^  is  the  unique  point 
where,  for  all  i,  U"(P',  i)  =  U‘“(P). 

Let  us  consider  the  case  where  at  least  one  box  is  inadmissible.  We  must  first  show  that 
U'’(P)  will  be  a  maximum  at  at  least  one  point  where,  for  some  i,  U“(P;i)  >  U“(P).  Let  B  rep¬ 
resent  the  set  of  admissible  boxes  and  B'  the  set  of  inadmissible  boxes.  Then, 

^i 

for  all  i  e  B:  -^  >  V  ; 

Pi 

for  all  i  c  B';  —  «  V 

ot 

Consider  a  point  P  e  Sg. 

For  all  j  e  B':  U’(P;  j)  >  min  U^iP;  i) 

icB 

Let  Pqq  be  a  point  belonging  to  Sg  at  which 

minU”(Pn„;i)  =  max  {minU’(P;i)}  s  V° 
icB  PcSg  ieB  ~  ^ 

Then,  «  V°  and 

'’i 

for  all  i  e  B:  —  >V''>V” 

01  t5 

Therefore,  all  boxes  belonging  to  B  are  also  admissible  in  the  reduced  game  involving  only  those 
boxes.  Hence,  Pgg  is  the  unique  point  belonging  to  Tg  fl  Sg. 

Consider  a  Pg  £  Sj^  where  for  all  i  =  1,  Z,  .  .  . ,  N,  U°(Pg;  i)  =  U'(Pg),  and  assume  that 
Vg  <  U“(Pg).  Define  B"  as  the  subset  of  B'  where 

V 

) 


132 


for  all  i  e  B": 


p.  =  max  p. 
"  jeB*  •> 


Then,  U*(P;B)  is  a  maximum  over  Sg„  which  does  not  include  Pq.  Therefore,  for  any  i  £  B", 
U*(Po)  =  U‘(Pq;B)  < <  V  . 

If  Vg  is  less  than  U°(Pq),  then  U”(P)  can  not  be  a  maximum  at  Pq. 

If  U°(P„)  =  V^,  then  U“(P.)  <  V*.  In  this  situation,  U"(P)  will  be  a  maximum  either  at  P„o 

— 0  a  — u  Uij 

as  well  as  Pq  or  at  neither  point.  Therefore,  if  any  inadmissible  boxes  exist,  U°(P)  must  be  a 

maximum  at  at  least  one  point  where  all  the  functions  {U*(P;i}  do  not  intersect. 

Assume  that  U“(P)  =  V”  at  a  point  where 

for  all  i  ^  i:  U*(P;  j)  >  U“(P;  i)  =  U*(P) 

But,  U°(P;i)  is  a  maximum  at  at  least  one  vertex,  and  P*  is  the  only  vertex  where  U°(P;  i)  =  U°(P). 
Therefore,  U°(P)  is  a  maximum  at  P^  and  box  i  dominates  all  of  the  other  boxes.  The  function 
U°(P;  i)  must  be  a  minimum  at  at  least  one  vertex  other  than  P^,  and  there  exists  at  least  one  in¬ 
admissible  box  where 

^<U“(Pq)4:V  . 

If  no  box  dominates  all  the  others,  and  if  at  least  one  box  is  inadmissible,  U“(P)  must  be  a 
maximum  at  at  least  one  point  belonging  to  a  subspace  T^.  Here,  A  must  include  at  least  two 
boxes,  but  we  can  choose  it  so  that  it  does  not  include  all  N.  Furthermore,  the  point  in  T^  can 
be  chosen  so  that 

for  all  i  ^  A:  U*(P;i)  >U''(P;A)  =  V  . 

The  function  U”(P:  A)  must  be  a  maximum  at  a  point  belonging  to  T^  O  or  at  a  vertex  P^  not 
belonging  to  S^.  At  each  of  these  vertexes,  U“(P*:  A)  >  U°(P^;  i).  Therefore,  there  exists  a 
point  €  T^  n  where  U”(Pq^;  A)  =  V”.  Also, 


for  all  i  ^  A;  -^  =  U"(P^  A)  «  U"(Pq^;  A)  =  V  . 


Hence,  A  must  contain  all  of  the  admissible  boxes,  and  for  any  Pp  in  there  must  be  at  least 
one  i  £  A  where 

^«U"(Pp)«V"  . 

The  set  A  could  have  been  chosen  so  that  it  would  contain  some  inadmissible  boxes.  How¬ 
ever,  it  contains  all  of  the  admissible  boxes  and  there  exists  a  Pq^  e  "^A^  ^A  OA^  “ 

Therefore,  we  can  consider  the  reduced  game  involving  only  those  boxes  belonging  to  A,  and  re¬ 
peat  the  process.  Each  time  this  is  done,  at  least  one  inadmissible  box  is  removed.  Eventually 
only  B,  the  set  of  admissible  boxes,  can  remain.  In  the  intersection  Tg  H  Sg  consists  of  a 
unique  point  and  we  can  now  state  that  U'(Ppg;  B)  =  U”(Ppg)  =  V”.  Also,  for  any  P^  in  Sj^, 

there  must  exist  at  least  one  inadmissible  box  for  which 
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Let  us  now  show  that  there  exists  a  good  strategy  for  the  searcher.  This  will  imply  that  V 
is  the  value  of  G°.  Assume  that  the  evader  first  hides  with  probability  P  e  and  that  tlie  searcher 
looks  first  into  a  box  i  that  belongs  to  B.  Assume  further  that  the  evader  always  uses  the  opti¬ 
mum  from  then  on  and  that  the  searcher  never  looks  into  an  inadmissible  box.  Let  U“'(P;i) 
represent  the  resulting  payoff: 

N 

U°'(P;i)  =  Vj  2j  PjPj  —  Pi(ViPi  +  ‘Ijd  j+  (1  —  p.qj)d  V° 

3=1 

But,  U°'{P;i)  is  linear  in  P  over  Sg.  Furthermore, 

for  all  i  e  B:  U°'(PQg;i)  =  V  . 

Since  Pn„  is  the  unique  point  in  where  minU“(P;i)  =  V“, 

®  icB 

for  all  i  e  B:  U”{p\  i)  <  V° 

As  a  result,  there  must  exist  a  unique  probability  vector  Yq  c  Sg  where 

for  all  P  €  Sg:  J  yoiU"’(P;i)  =  V“  . 

ieB 

If  the  searcher  uses  Yq  to  determine  each  look,  the  payoff  will  equal  V°  as  long  as  the  evader 
never  hides  in  an  inadmissible  box.  If  he  does  hide  in  such  a  box,  the  payoff  cannot  be  greater 
for  Pj/o  V°  for  each  such  box.  Therefore,  Y^j  limits  the  evader  to  V°,  a  payoff  that  the  evader 
can  guarantee.  The  searcher's  good  strategy  is  defined  by  Yq,  which  can  be  calculated  by  using 
the  techniques  suggested  in  Sec.  8.3. 
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